Splunk Search

Using regex to capture exactly 20 characters

hharvey
Explorer

I need to create a field extraction that extracts the first 20 characters ONLY from an error log; I've got the regex that extracts the full error:

rex "\#[\w0-9\W]{9}\:\s(?P!ERROR[^\\*]+)"

FYI in my regex above: !ERROR = < error> (no space) - the text editor is removing anything after < even when using the code sample optiion.

Is there regex that will capture only the first 20 characters as the field < error>? Here are the logs in question and I provided an example of the field data I am trying to extract.

I feel like I may be able to use the substr command for eval, but not exactly sure of the correct format... this doesn't seem to work:

ex "\#[\w0-9\W]{9}\:\s(?P!ERROR[^\\*]+)" | top 100 error | eval error=substr("error", 1, 20)

s1-sn701:2012-08-14 09:55:09,723 INFO  [STDOUT] [ERROR] 2012-08-14 09:55:09           LP::ThisController - #aWMfOOXSL: EAL: ASYNC: in async payment, could not create items, api returned 320
s1-sn903:2012-08-14 07:01:34,169 INFO  [STDOUT] [ERROR] 2012-08-14 07:01:34           LP::OfferController - #dN'Fi<<Od: Error decoding or storing lat/long, exception was 'undefined method `[]' for nil:NilClass'
s1-sn902:2012-08-14 01:33:23,562 INFO  [STDOUT] [ERROR] 2012-08-14 01:33:23           UI::ReportController - #fm7e(n$2J: API returned 952 error for report data
s1-sn902:2012-08-14 01:11:31,431 INFO  [STDOUT] [ERROR] 2012-08-14 01:11:31           LP::ThisController - #9['?rp`fY: PAYKEY from payment data is blank or missing on item page
s1-sn902:2012-08-14 01:11:31,430 INFO  [STDOUT] [ERROR] 2012-08-14 01:11:31           LP::ThisController - #9['?rp`fY: PAYKEY from session is blank or missing on item page
s1-sn902:2012-08-14 00:15:16,746 INFO  [STDOUT] [ERROR] 2012-08-14 00:15:16           LP::ThisController - #Xq5Bez;vF: Attempting to purchase item that is expired
s1-sn701:2012-08-13 23:55:22,969 INFO  [STDOUT] [ERROR] 2012-08-13 23:55:22           LP::OfferController - #\)F3XjY_v: PAYKEY is blank or missing on item page
s1-sn701:2012-08-13 23:29:31,458 INFO  [STDOUT] [ERROR] 2012-08-13 23:29:31           LP::ThisController - #z|gXWQY1S: EAL: ASYNC: in async payment could not create items, api returned 320
s1-sn902:2012-08-13 12:40:13,350 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): Failed to get [1]https://aurl.url.com/v1/85/pp/accounting/  [2]betsy@betsyklein.com/
s1-sn902:2012-08-13 12:40:13,349 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): ["classpath:/META-INF/jruby.home/lib/ruby/1.8/uri/common.rb:436:in `split'"
s1-sn902:2012-08-13 12:40:13,347 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): -----------------------------
s1-sn902:2012-08-13 12:40:13,346 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): bad URI(is not URI?): [3]https://aurl.url.com/85/bills/pp/accounting/  [4]uname@aurl.com/
s1-sn902:2012-08-13 12:40:13,346 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): Oops, an error occured!

Example of data I want to extract as the error field:

EAL: ASYNC: in async
Error decoding or st
API returned 952 err
PAYKEY from payment 
PAYKEY from session 
Attempting to purcha
PAYKEY is blank or m
ASYNC: in async pay
Failed to get [1]htt
["classpath:/META-IN
--------------------
bad URI(is not URI?)
Oops, an error occur
Tags (1)
0 Karma
1 Solution

kristian_kolb
Ultra Champion

Hmm, that was a bit hard to read... 🙂

First do you NEED the full error messsage, otherwise you can just alter the rex to just capture up to 20 characters;

rex "\#[\w0-9\W]{9}:\s(?P<ERROR>[^\\*]{1,20})"

Also, you could probably make it a bit easier on the eye like this;

rex "\#.{9}:\s(?P<ERROR>.{20})"

if the messages themselves are always more than 20 chars long.

Hope this helps,

Kristian

View solution in original post

kristian_kolb
Ultra Champion

Hmm, that was a bit hard to read... 🙂

First do you NEED the full error messsage, otherwise you can just alter the rex to just capture up to 20 characters;

rex "\#[\w0-9\W]{9}:\s(?P<ERROR>[^\\*]{1,20})"

Also, you could probably make it a bit easier on the eye like this;

rex "\#.{9}:\s(?P<ERROR>.{20})"

if the messages themselves are always more than 20 chars long.

Hope this helps,

Kristian

kristian_kolb
Ultra Champion

Well, you used that construct in the beginning - the {9} 🙂
As you said EXACTLY 20 characters it's probably more correct to use {20} instead of {1,20} - but that's your decision.

/k

0 Karma

hharvey
Explorer

Thanks Kristian! adding {1,20} did it, I just didn't realize that was an option in regex.

i agree, my post was pretty to read through. sorry!

0 Karma
Get Updates on the Splunk Community!

Changes to Splunk Instructor-Led Training Completion Criteria

We’re excited to share an update to our instructor-led training program that enhances the learning experience ...

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

❄️ Welcome the new year with our January lineup of Community Office Hours, Tech Talks, and Webinars! &#x1f389; ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...