<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to optimize the regular expression for our rex statement to extract Java errors from our sample data? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233897#M69508</link>
    <description>&lt;P&gt;Thanks guys. mhpark's works pretty good, although extracts some of the exceptions a little differently than the original, and it does do it faster.&lt;/P&gt;

&lt;P&gt;Gabe,&lt;/P&gt;

&lt;P&gt;I like you comment, but the flexability of not having to update the preceeding strings everytime a new one is added made me shy away from it. Which, was another of my goals. So if tomorrow a new error showed up under java.some.bs.string.like.this. I wouldnt have to edit the dahsboard/reports to catch it. &lt;/P&gt;

&lt;P&gt;-JD&lt;/P&gt;</description>
    <pubDate>Mon, 22 Aug 2016 16:26:25 GMT</pubDate>
    <dc:creator>JDukeSplunk</dc:creator>
    <dc:date>2016-08-22T16:26:25Z</dc:date>
    <item>
      <title>How to optimize the regular expression for our rex statement to extract Java errors from our sample data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233892#M69503</link>
      <description>&lt;P&gt;So, we have a really nasty regex that runs against a customized version of a tomcat log. The rex finds certain strings within the _raw data and grabs the last bit of the error message. I am just looking for a more elegant solution, and one that will most likely not kill the search heads. If we find one that is good enough, we can get it out of inline and put it in a transforms/props. &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; |rex field=_raw "(com.pega.apache.http.conn|java.sql|com.pega.pegarules.pub.clipboard|java.net|com.pega.pegarules.pub.services|com.pega.pegarules.pub.context|com.pega.pegarules.pub| com.pega.pegarules.pub.database|com.pega.pegarules.pub.generator|java.lang|com.sun.jersey.api.client).(?&amp;lt;type&amp;gt;\w+)(\s|:)"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;The number of periods move each time, and sometimes end with a space, sometimes end with a &lt;CODE&gt;:&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;Some examples of the source. Highlighted are the bits we are currently extracting. &lt;/P&gt;

&lt;P&gt;2016-08-13 23:59:58,956 [ttp-bio-8005-exec-12] [  STANDARD] [                    ] &lt;A href="https://community.splunk.com/.generated.phsbustier_memcache"&gt;   Portal:01.50&lt;/A&gt; ERROR TTAPPPEGAAPP05.company.com|172.22.101.10|HTTP|Recommendation|Recommendations|Recommend|AF64722BFA23E77DEE185E39B3A281D0C  - java.util.concurrent.ExecutionException: java.lang.&lt;STRONG&gt;RuntimeException&lt;/STRONG&gt;: Cancelled&lt;/P&gt;

&lt;P&gt;9:59:01.300 PM&lt;BR /&gt;&lt;BR /&gt;
2016-08-13 21:59:01,300 [http-bio-8004-exec-1] [  STANDARD] [                    ] &lt;A href="https://community.splunk.com/s.generated.pega_wb_lookuplist"&gt;   Portal:01.50&lt;/A&gt; ERROR TTAPPPEGAAPP05.company.com|172.22.101.10|HTTP|PortalFeatures|Services|PostChallengeData|A660C7C3D30428FBD26529DE9859DEB5F  - LookupList : error reading from file file://llc:/LLC/Rule-Obj-FieldValue/&lt;EM&gt;getFieldValue&lt;/EM&gt;.xml. java.io.IOException: Exception 'com.pega.pegarules.pub.clipboard.&lt;STRONG&gt;InvalidStreamError&lt;/STRONG&gt;: Invalid clipboard stream detected in module com.pega.pegarules.data.internal.clipboard.XMLStream.new&lt;/P&gt;

&lt;P&gt;2016-08-13 14:32:56,776 [http-bio-8001-exec-5] [  STANDARD] [                    ] &lt;A href="https://community.splunk.com/internal.mgmt.Executable"&gt;        PHSInt:01.01&lt;/A&gt; ERROR TTAPPPEGAAPP02.company.com|172.22.101.10|HTTP|AssessmentServices|Services|SaveAssessmentAnswers|AEFBBD97AEE6CED837A732AD77C6C437F  - Exception&lt;BR /&gt;
com.pega.pegarules.pub.&lt;STRONG&gt;PRRuntimeException&lt;/STRONG&gt;: Unable to identify default schema for the connection to Device_Staging&lt;BR /&gt;
    at com.pega.pegarules.data.internal.access.DatabaseTableImpl.getSchemaName(DatabaseTableImpl.java:360)&lt;BR /&gt;
    at com.pega.pegarules.data.internal.access.DatabaseTableImpl.getFullyQualifiedTableName(DatabaseTableImpl.java:416)&lt;BR /&gt;
    at com.pega.pegarules.data.internal.access.rdb.SQLParser.directive(SQLParser.java:653)&lt;/P&gt;

&lt;P&gt;2016-08-13 13:17:25,746 [http-bio-8003-exec-5] [  STANDARD] [                    ] &lt;A href="https://community.splunk.com/pollo_Data_UserActivity.Action"&gt;        PHSInt:01.01&lt;/A&gt; ERROR TTAPPPEGAAPP02.company.com|172.22.101.10|HTTP|UserActivityInt|Services|SavePartUserActivityReq  - HCIncentiveEvent failed for MemberEligID:69691976Params are ObjectiveID:103021210ActivityType:2::** Caught unhandled exception: java.net.&lt;STRONG&gt;SocketTimeoutException&lt;/STRONG&gt;: Read timed out&lt;/P&gt;

&lt;P&gt;2016-08-12 10:46:40,992 [http-bio-8003-exec-4] [  STANDARD] [                    ] &lt;A href="https://community.splunk.com/l.access.ConnectionManagerImpl"&gt;        PHSInt:01.01&lt;/A&gt; ERROR TTAPPPEGAAPP08.company.com|172.22.101.10|HTTP|MessageCenter|Services|SavePtNotifPreferences|A32A2BB43A9ABBCD410AAB8D6AC3D6FD3  - Not returning connection 2 for database "pegadata" to the pool as it previously encountered the following error&lt;BR /&gt;
User ID: (unknown)&lt;BR /&gt;
Last SQL: call SECUREMESSAGING_PKG.InsertUpdatePtPreference(       ?,       ?,       ?,       ?,          ?,       ?,       ?,       ? )&lt;BR /&gt;
java.sql.&lt;STRONG&gt;SQLException&lt;/STRONG&gt;: ORA-06502: PL/SQL: numeric or value error: character string buffer too small&lt;/P&gt;</description>
      <pubDate>Fri, 19 Aug 2016 20:04:22 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233892#M69503</guid>
      <dc:creator>JDukeSplunk</dc:creator>
      <dc:date>2016-08-19T20:04:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimize the regular expression for our rex statement to extract Java errors from our sample data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233893#M69504</link>
      <description>&lt;P&gt;Judging by only the given examples, I would go like this;&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; rex field=_raw "\.(?&amp;lt;error_type&amp;gt;[^\.\:]+(Exception|Error))\:"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 19 Aug 2016 23:31:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233893#M69504</guid>
      <dc:creator>mhpark</dc:creator>
      <dc:date>2016-08-19T23:31:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimize the regular expression for our rex statement to extract Java errors from our sample data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233894#M69505</link>
      <description>&lt;P&gt;I like mhpark's answer, but I thought I would comment on your original regex too.&lt;/P&gt;

&lt;P&gt;First, is your main problem with it performance or elegance? I think the job inspector might help measure the performance, maybe there's a line dedicated to regexes. If performance isn't an issue, then elegance should not keep you awake at night, as much as maintainability. In that respect, your regex isn't particularly nasty.&lt;/P&gt;

&lt;P&gt;About the regex itself, first up all your dots should be escaped, especially the one outside the parenthesis. As I'm sure you know, dots match any character so for instance this bit of your regex: &lt;CODE&gt;(com.pega.pegarules.pub).(?&amp;lt;type&amp;gt;\w+)(\s|:)"&lt;/CODE&gt; would match the string &lt;CODE&gt;com.pega.pegarules.public:&lt;/CODE&gt;and extract "ic" as a type... &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;You can also speed things up a bit by starting with a word boundary: &lt;CODE&gt;\b(com\.pega\.apache\.http\.conn|java\.sql|.......&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;Finally, you could regroup similar alternatives together. So for instance, you could replace &lt;CODE&gt;com\.pega\.apache\.http\.conn|...|com\.pega\.pegarules\.pub\.clipboard&lt;/CODE&gt; with &lt;CODE&gt;com\.pega\.(apache\.http\.conn|pegarules\.pub\.clipboard)|...&lt;/CODE&gt;. That should speed things up a bit, but again you need to benchmark it to see if it's worth the loss in readability.&lt;/P&gt;

&lt;P&gt;That's assuming you're not going with something a lot simpler (but is it faster? :-P) like mhpark suggested.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Aug 2016 09:03:05 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233894#M69505</guid>
      <dc:creator>gabriel_vasseur</dc:creator>
      <dc:date>2016-08-22T09:03:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimize the regular expression for our rex statement to extract Java errors from our sample data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233895#M69506</link>
      <description>&lt;P&gt;Writing all your terms would be faster for sure.&lt;BR /&gt;
I was assuming there might be cases where the already given words could not cover.&lt;/P&gt;

&lt;P&gt;Thank you for your comment &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Aug 2016 13:47:52 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233895#M69506</guid>
      <dc:creator>mhpark</dc:creator>
      <dc:date>2016-08-22T13:47:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimize the regular expression for our rex statement to extract Java errors from our sample data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233896#M69507</link>
      <description>&lt;P&gt;That's a good point, I don't know how easy it is to gather an exhaustive list.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Aug 2016 13:53:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233896#M69507</guid>
      <dc:creator>gabriel_vasseur</dc:creator>
      <dc:date>2016-08-22T13:53:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimize the regular expression for our rex statement to extract Java errors from our sample data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233897#M69508</link>
      <description>&lt;P&gt;Thanks guys. mhpark's works pretty good, although extracts some of the exceptions a little differently than the original, and it does do it faster.&lt;/P&gt;

&lt;P&gt;Gabe,&lt;/P&gt;

&lt;P&gt;I like you comment, but the flexability of not having to update the preceeding strings everytime a new one is added made me shy away from it. Which, was another of my goals. So if tomorrow a new error showed up under java.some.bs.string.like.this. I wouldnt have to edit the dahsboard/reports to catch it. &lt;/P&gt;

&lt;P&gt;-JD&lt;/P&gt;</description>
      <pubDate>Mon, 22 Aug 2016 16:26:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233897#M69508</guid>
      <dc:creator>JDukeSplunk</dc:creator>
      <dc:date>2016-08-22T16:26:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimize the regular expression for our rex statement to extract Java errors from our sample data?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233898#M69509</link>
      <description>&lt;P&gt;Yes, that is best. I mostly commented for the educational value!&lt;/P&gt;</description>
      <pubDate>Tue, 23 Aug 2016 07:43:28 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-to-optimize-the-regular-expression-for-our-rex-statement-to/m-p/233898#M69509</guid>
      <dc:creator>gabriel_vasseur</dc:creator>
      <dc:date>2016-08-23T07:43:28Z</dc:date>
    </item>
  </channel>
</rss>

