<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Transforms.conf regex performance in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484045#M135507</link>
    <description>&lt;P&gt;If you're using transforms to route events, there is no extraction happening.  All you need to do is identify which events get indexed and which do not.&lt;/P&gt;</description>
    <pubDate>Fri, 17 Jan 2020 01:13:49 GMT</pubDate>
    <dc:creator>richgalloway</dc:creator>
    <dc:date>2020-01-17T01:13:49Z</dc:date>
    <item>
      <title>Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484038#M135500</link>
      <description>&lt;P&gt;I am trying to capture the logging of any martian packets on a Linux system, so I decided to set a monitor in /var/log/messages and created a transform that sends to the indexQueue only messages that are related to martian packets. I wrote this regex:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;\w{1,4}\s+\d{0,2}\s+[01][0-9]:[0-5][0-9]:[0-5][0-9]\s+[a-z]+\s+kernel:\s+martian\s+source\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+from\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3},\s+on\s+dev\s\w+\n.+
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Is this overkill for performance purposes and would it even work? I have read that the more detailed the regex, the better it would perform and since that file logs the majority of the kernel messages (I don't care about any other but martian packets for this specific system) I figured I would have to make sure it wouldn't slow down the receiving indexer.&lt;/P&gt;

&lt;P&gt;Thoughts and comments? Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 17:36:57 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484038#M135500</guid>
      <dc:creator>ricotries</dc:creator>
      <dc:date>2020-01-16T17:36:57Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484039#M135501</link>
      <description>&lt;P&gt;The best way to know if a regex is good or not is to put some examples of martian packets into a &lt;A href="http://www.regex101.com"&gt;www.regex101.com&lt;/A&gt; example with your regex.  It will tell you the amount of steps that is takes to accomplish the extraction.  I would then save it on that website and share the link on your question so people have some sample data to work with.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 18:22:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484039#M135501</guid>
      <dc:creator>dmarling</dc:creator>
      <dc:date>2020-01-16T18:22:31Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484040#M135502</link>
      <description>&lt;P&gt;Doesn't Splunk use perl? I don't see it as an engine option in that website&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 19:34:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484040#M135502</guid>
      <dc:creator>ricotries</dc:creator>
      <dc:date>2020-01-16T19:34:04Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484041#M135503</link>
      <description>&lt;P&gt;Per their documentation they use PCRE: &lt;CODE&gt;Splunk regular expressions are PCRE (Perl Compatible Regular Expressions) and use the PCRE C library.&lt;/CODE&gt;&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/8.0.1/Knowledge/AboutSplunkregularexpressions"&gt;https://docs.splunk.com/Documentation/Splunk/8.0.1/Knowledge/AboutSplunkregularexpressions&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Traditionally the PCRE (PHP) engine on the regex101.com website is used for regex trouble shooting with splunk and has been extremely accurate in my personal use with it and with other Splunk users on this board.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 19:37:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484041#M135503</guid>
      <dc:creator>dmarling</dc:creator>
      <dc:date>2020-01-16T19:37:31Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484042#M135504</link>
      <description>&lt;P&gt;This is the link with a very simple regex:&lt;BR /&gt;
&lt;A href="https://regex101.com/r/grB83o/1"&gt;https://regex101.com/r/grB83o/1&lt;/A&gt;&lt;BR /&gt;
If you check the debugger, it runs thousands of steps if there are many logs that don't match the pattern.&lt;/P&gt;

&lt;P&gt;This is the link with the regex I posted (with some alterations):&lt;BR /&gt;
&lt;A href="https://regex101.com/r/16yOf7/1"&gt;https://regex101.com/r/16yOf7/1&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;How do I force the regex to skip a line if it immediately doesn't match the pattern, instead of looping in the same line trying to find anything that matches? (The question makes more sense if you check the debugger on steps that were going through lines that did not match the pattern.)&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 20:06:10 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484042#M135504</guid>
      <dc:creator>ricotries</dc:creator>
      <dc:date>2020-01-16T20:06:10Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484043#M135505</link>
      <description>&lt;P&gt;If all you care about are martian events then just look for the text that identifies it.  Everything else is just wasted processing.  This string is only 518 steps: &lt;CODE&gt;server\skernel:\smartian\ssource\s&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;I disagree with the notion that detailed regexes perform better.  Here is an example to disprove it.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 20:44:59 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484043#M135505</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2020-01-16T20:44:59Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484044#M135506</link>
      <description>&lt;P&gt;Wouldn't that only extract the segments that match the expression? I am trying to extract the entire line so I can identify timestamps and IP addresses, as well as the following line (which is why I add '\n.+' at the end of the expression).&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 20:54:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484044#M135506</guid>
      <dc:creator>ricotries</dc:creator>
      <dc:date>2020-01-16T20:54:13Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484045#M135507</link>
      <description>&lt;P&gt;If you're using transforms to route events, there is no extraction happening.  All you need to do is identify which events get indexed and which do not.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2020 01:13:49 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484045#M135507</guid>
      <dc:creator>richgalloway</dc:creator>
      <dc:date>2020-01-17T01:13:49Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484046#M135508</link>
      <description>&lt;P&gt;you can modify your rsyslod/syslog-ng configuration on the linux host and write events you are interested in a separate file, then monitor this file with UF.&lt;/P&gt;

&lt;P&gt;old rsyslogd format:&lt;BR /&gt;
    :msg, contains, "kernel: martian source" -/var/log/martian.log&lt;/P&gt;

&lt;P&gt;new rsyslogd format:&lt;BR /&gt;
  if $msg contains 'kernel: martian source' then /var/log/martian.log&lt;/P&gt;

&lt;P&gt;Don't forget to add logrotate configuration (copy /etc/logrotate.d/syslog to /etc/logrotate.d/martian and modify accordingly) so the martian.log will be rotated and at some point deleted.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2020 11:03:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484046#M135508</guid>
      <dc:creator>PavelProstine</dc:creator>
      <dc:date>2020-01-17T11:03:36Z</dc:date>
    </item>
    <item>
      <title>Re: Transforms.conf regex performance</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484047#M135509</link>
      <description>&lt;P&gt;I did not know that, that is actually very helpful!&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2020 12:18:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Transforms-conf-regex-performance/m-p/484047#M135509</guid>
      <dc:creator>ricotries</dc:creator>
      <dc:date>2020-01-17T12:18:04Z</dc:date>
    </item>
  </channel>
</rss>

