Splunk Search

How to optimize the regular expression for our rex statement to extract Java errors from our sample data?

JDukeSplunk
Builder

So, we have a really nasty regex that runs against a customized version of a tomcat log. The rex finds certain strings within the _raw data and grabs the last bit of the error message. I am just looking for a more elegant solution, and one that will most likely not kill the search heads. If we find one that is good enough, we can get it out of inline and put it in a transforms/props.

 |rex field=_raw "(com.pega.apache.http.conn|java.sql|com.pega.pegarules.pub.clipboard|java.net|com.pega.pegarules.pub.services|com.pega.pegarules.pub.context|com.pega.pegarules.pub| com.pega.pegarules.pub.database|com.pega.pegarules.pub.generator|java.lang|com.sun.jersey.api.client).(?<type>\w+)(\s|:)"

The number of periods move each time, and sometimes end with a space, sometimes end with a :

Some examples of the source. Highlighted are the bits we are currently extracting.

2016-08-13 23:59:58,956 [ttp-bio-8005-exec-12] [ STANDARD] [ ] Portal:01.50 ERROR TTAPPPEGAAPP05.company.com|172.22.101.10|HTTP|Recommendation|Recommendations|Recommend|AF64722BFA23E77DEE185E39B3A281D0C - java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancelled

9:59:01.300 PM

2016-08-13 21:59:01,300 [http-bio-8004-exec-1] [ STANDARD] [ ] Portal:01.50 ERROR TTAPPPEGAAPP05.company.com|172.22.101.10|HTTP|PortalFeatures|Services|PostChallengeData|A660C7C3D30428FBD26529DE9859DEB5F - LookupList : error reading from file file://llc:/LLC/Rule-Obj-FieldValue/getFieldValue.xml. java.io.IOException: Exception 'com.pega.pegarules.pub.clipboard.InvalidStreamError: Invalid clipboard stream detected in module com.pega.pegarules.data.internal.clipboard.XMLStream.new

2016-08-13 14:32:56,776 [http-bio-8001-exec-5] [ STANDARD] [ ] PHSInt:01.01 ERROR TTAPPPEGAAPP02.company.com|172.22.101.10|HTTP|AssessmentServices|Services|SaveAssessmentAnswers|AEFBBD97AEE6CED837A732AD77C6C437F - Exception
com.pega.pegarules.pub.PRRuntimeException: Unable to identify default schema for the connection to Device_Staging
at com.pega.pegarules.data.internal.access.DatabaseTableImpl.getSchemaName(DatabaseTableImpl.java:360)
at com.pega.pegarules.data.internal.access.DatabaseTableImpl.getFullyQualifiedTableName(DatabaseTableImpl.java:416)
at com.pega.pegarules.data.internal.access.rdb.SQLParser.directive(SQLParser.java:653)

2016-08-13 13:17:25,746 [http-bio-8003-exec-5] [ STANDARD] [ ] PHSInt:01.01 ERROR TTAPPPEGAAPP02.company.com|172.22.101.10|HTTP|UserActivityInt|Services|SavePartUserActivityReq - HCIncentiveEvent failed for MemberEligID:69691976Params are ObjectiveID:103021210ActivityType:2::** Caught unhandled exception: java.net.SocketTimeoutException: Read timed out

2016-08-12 10:46:40,992 [http-bio-8003-exec-4] [ STANDARD] [ ] PHSInt:01.01 ERROR TTAPPPEGAAPP08.company.com|172.22.101.10|HTTP|MessageCenter|Services|SavePtNotifPreferences|A32A2BB43A9ABBCD410AAB8D6AC3D6FD3 - Not returning connection 2 for database "pegadata" to the pool as it previously encountered the following error
User ID: (unknown)
Last SQL: call SECUREMESSAGING_PKG.InsertUpdatePtPreference( ?, ?, ?, ?, ?, ?, ?, ? )
java.sql.SQLException: ORA-06502: PL/SQL: numeric or value error: character string buffer too small

0 Karma
1 Solution

mhpark
Path Finder

Judging by only the given examples, I would go like this;

 rex field=_raw "\.(?<error_type>[^\.\:]+(Exception|Error))\:"

View solution in original post

gabriel_vasseur
Contributor

I like mhpark's answer, but I thought I would comment on your original regex too.

First, is your main problem with it performance or elegance? I think the job inspector might help measure the performance, maybe there's a line dedicated to regexes. If performance isn't an issue, then elegance should not keep you awake at night, as much as maintainability. In that respect, your regex isn't particularly nasty.

About the regex itself, first up all your dots should be escaped, especially the one outside the parenthesis. As I'm sure you know, dots match any character so for instance this bit of your regex: (com.pega.pegarules.pub).(?<type>\w+)(\s|:)" would match the string com.pega.pegarules.public:and extract "ic" as a type... 🙂

You can also speed things up a bit by starting with a word boundary: \b(com\.pega\.apache\.http\.conn|java\.sql|........

Finally, you could regroup similar alternatives together. So for instance, you could replace com\.pega\.apache\.http\.conn|...|com\.pega\.pegarules\.pub\.clipboard with com\.pega\.(apache\.http\.conn|pegarules\.pub\.clipboard)|.... That should speed things up a bit, but again you need to benchmark it to see if it's worth the loss in readability.

That's assuming you're not going with something a lot simpler (but is it faster? :-P) like mhpark suggested.

0 Karma

mhpark
Path Finder

Writing all your terms would be faster for sure.
I was assuming there might be cases where the already given words could not cover.

Thank you for your comment 🙂

0 Karma

JDukeSplunk
Builder

Thanks guys. mhpark's works pretty good, although extracts some of the exceptions a little differently than the original, and it does do it faster.

Gabe,

I like you comment, but the flexability of not having to update the preceeding strings everytime a new one is added made me shy away from it. Which, was another of my goals. So if tomorrow a new error showed up under java.some.bs.string.like.this. I wouldnt have to edit the dahsboard/reports to catch it.

-JD

0 Karma

gabriel_vasseur
Contributor

Yes, that is best. I mostly commented for the educational value!

0 Karma

gabriel_vasseur
Contributor

That's a good point, I don't know how easy it is to gather an exhaustive list.

0 Karma

mhpark
Path Finder

Judging by only the given examples, I would go like this;

 rex field=_raw "\.(?<error_type>[^\.\:]+(Exception|Error))\:"
Get Updates on the Splunk Community!

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...