<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Help in optimising a horrible regex (46K+ steps) in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693102#M235863</link>
    <description>&lt;P&gt;Try this&amp;nbsp;&lt;A href="https://regex101.com/r/rlI3Xl/2" target="_blank"&gt;https://regex101.com/r/rlI3Xl/2&lt;/A&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| rex field=source_hostname "(?i)^AZ(?&amp;lt;cap1&amp;gt;[A-Z0-9-]+?)(?=\1[A-Z0-9]{6})(?&amp;lt;temp_hostname4&amp;gt;\1[A-Z0-9]{6})-\d{10}-VMSS$"&lt;/LI-CODE&gt;</description>
    <pubDate>Thu, 11 Jul 2024 17:30:05 GMT</pubDate>
    <dc:creator>ITWhisperer</dc:creator>
    <dc:date>2024-07-11T17:30:05Z</dc:date>
    <item>
      <title>Help in optimising a horrible regex (46K+ steps)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693079#M235860</link>
      <description>&lt;P&gt;Hi Splunkers,&lt;BR /&gt;&lt;BR /&gt;I am trying to extract a string within a string, which has been repeated, with the addition of some pre- and -post fixes, only the very start and end of the string are static values ('AZ-' and '-VMSS').&lt;/P&gt;
&lt;P&gt;Example data:&lt;/P&gt;
&lt;P&gt;AZ-203-dev-app-1-build-agents-203-dev-app-1-build-agents0006GA-1720624093-VMSS&lt;/P&gt;
&lt;P&gt;AZ-eun-dev-005-pqu-ado-vmss-eun-dev-005-pqu-ado-vmss005X89-1720625975-VMSS&lt;/P&gt;
&lt;P&gt;AZ-DEV-CROSS-SUBSCRIPTION-PROXY-EUN-BLUE-DEV-CROSS-SUBSCRIPTION-PROXY-EUN-BLUE000000-1720637733-VMSS&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a working rex command to extract the relevant data (temp_hostname4):&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;| rex field=source_hostname "(?i)^AZ(?&amp;lt;cap1&amp;gt;(-[A-Z0-9]+)+)(?=\1[A-Z0-9]{6})-(?&amp;lt;temp_hostname4&amp;gt;([A-Z0-9]+-?)+)-\d{10}-VMSS$"&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Which correctly extracts:&lt;/P&gt;
&lt;P&gt;203-dev-app-1-build-agents0006GA&lt;/P&gt;
&lt;P&gt;eun-dev-005-pqu-ado-vmss005X89&lt;/P&gt;
&lt;P&gt;DEV-CROSS-SUBSCRIPTION-PROXY-EUN-BLUE000000&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But let's face it, this is horrible! According to regex101 this takes 46K+ steps, which can't be nice for Splunk to apply to c.20K records several times per day.&lt;/P&gt;
&lt;P&gt;Can anyone suggest optimisations to bring that number down?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For added complication (and for clarity to anyone reading this) it's temp_hostname&lt;STRONG&gt;4&lt;/STRONG&gt; because there are multiple other ways the hostname might have been... manipulated before it gets to Splunk, sometimes with the string repeated, sometimes not, resulting in the following SPL - I could use coalesce rather than case, but that's hardly important right now, and separating the regex statements seemed like the saner thing to do in this instance &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;| rex field=source_hostname "(?i)^AZ(?&amp;lt;cap1&amp;gt;(-[A-Z0-9]+)+)(?=\1[A-Z0-9]{6})-(?&amp;lt;temp_hostname4&amp;gt;([A-Z0-9]+-?)+)-\d{10}-VMSS$"
| rex field=source_hostname "(?i)^AZ-(?&amp;lt;temp_hostname3&amp;gt;[^.]+)-\d{10}-VMSS$"
| rex field=source_hostname "(?i)^AZ-(?&amp;lt;temp_hostname2&amp;gt;[^.]+)-\d{10}$"
| rex field=source_hostname "(?i)^(?&amp;lt;temp_hostname1&amp;gt;[^.]+)_\d{10}$"

| eval alias_source_of=case(
!isnull(temp_hostname4), temp_hostname4,
!isnull(temp_hostname3), temp_hostname3,
!isnull(temp_hostname2), temp_hostname2,
!isnull(temp_hostname1), temp_hostname1,
1=1, null()
)&lt;/LI-CODE&gt;
&lt;P&gt;Any suggestions for optimisations of the regex would be gratefully appreciated.&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jul 2024 15:59:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693079#M235860</guid>
      <dc:creator>Nikobobinus</dc:creator>
      <dc:date>2024-07-11T15:59:41Z</dc:date>
    </item>
    <item>
      <title>Re: Help in optimising a horrible regex (46K+ steps)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693093#M235861</link>
      <description>&lt;PRE&gt;(?i)^AZ-(?&amp;lt;temp_hostname4&amp;gt;([-A-Z0-9]+))(?:[-A-Z0-9]+?)(?=\1).*-VMSS$&lt;/PRE&gt;&lt;P&gt;According to regex101, it matches your 3 events in slightly less than 13k (which gives about 4k steps per event)&lt;/P&gt;&lt;P&gt;&lt;A href="https://regex101.com/r/8h4zwD/1" target="_blank"&gt;https://regex101.com/r/8h4zwD/1&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jul 2024 16:53:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693093#M235861</guid>
      <dc:creator>PickleRick</dc:creator>
      <dc:date>2024-07-11T16:53:08Z</dc:date>
    </item>
    <item>
      <title>Re: Help in optimising a horrible regex (46K+ steps)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693102#M235863</link>
      <description>&lt;P&gt;Try this&amp;nbsp;&lt;A href="https://regex101.com/r/rlI3Xl/2" target="_blank"&gt;https://regex101.com/r/rlI3Xl/2&lt;/A&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| rex field=source_hostname "(?i)^AZ(?&amp;lt;cap1&amp;gt;[A-Z0-9-]+?)(?=\1[A-Z0-9]{6})(?&amp;lt;temp_hostname4&amp;gt;\1[A-Z0-9]{6})-\d{10}-VMSS$"&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 11 Jul 2024 17:30:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693102#M235863</guid>
      <dc:creator>ITWhisperer</dc:creator>
      <dc:date>2024-07-11T17:30:05Z</dc:date>
    </item>
    <item>
      <title>Re: Help in optimising a horrible regex (46K+ steps)</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693157#M235876</link>
      <description>&lt;OL&gt;&lt;LI&gt;&lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/225168"&gt;@ITWhisperer&lt;/a&gt;and &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/231884"&gt;@PickleRick&lt;/a&gt; thank you both very much! Technically PickleRick's mimics the precise result better, but takes c.13K steps, while ITWhisperer's answer takes just 332 steps and leaves a leading hyphen (which is easy enough to strip out).&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;I'm going to accept ITWhisperer's as the solution for it's efficiency, but wanted to call out that PickleRick's result, as a pure regex solution, is technically better.&lt;/P&gt;&lt;P&gt;Thank you both!&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jul 2024 08:55:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Help-in-optimising-a-horrible-regex-46K-steps/m-p/693157#M235876</guid>
      <dc:creator>Nikobobinus</dc:creator>
      <dc:date>2024-07-12T08:55:06Z</dc:date>
    </item>
  </channel>
</rss>

