Splunk Search

Using regex to capture exactly 20 characters

hharvey
Explorer

I need to create a field extraction that extracts the first 20 characters ONLY from an error log; I've got the regex that extracts the full error:

rex "\#[\w0-9\W]{9}\:\s(?P!ERROR[^\\*]+)"

FYI in my regex above: !ERROR = < error> (no space) - the text editor is removing anything after < even when using the code sample optiion.

Is there regex that will capture only the first 20 characters as the field < error>? Here are the logs in question and I provided an example of the field data I am trying to extract.

I feel like I may be able to use the substr command for eval, but not exactly sure of the correct format... this doesn't seem to work:

ex "\#[\w0-9\W]{9}\:\s(?P!ERROR[^\\*]+)" | top 100 error | eval error=substr("error", 1, 20)

s1-sn701:2012-08-14 09:55:09,723 INFO  [STDOUT] [ERROR] 2012-08-14 09:55:09           LP::ThisController - #aWMfOOXSL: EAL: ASYNC: in async payment, could not create items, api returned 320
s1-sn903:2012-08-14 07:01:34,169 INFO  [STDOUT] [ERROR] 2012-08-14 07:01:34           LP::OfferController - #dN'Fi<<Od: Error decoding or storing lat/long, exception was 'undefined method `[]' for nil:NilClass'
s1-sn902:2012-08-14 01:33:23,562 INFO  [STDOUT] [ERROR] 2012-08-14 01:33:23           UI::ReportController - #fm7e(n$2J: API returned 952 error for report data
s1-sn902:2012-08-14 01:11:31,431 INFO  [STDOUT] [ERROR] 2012-08-14 01:11:31           LP::ThisController - #9['?rp`fY: PAYKEY from payment data is blank or missing on item page
s1-sn902:2012-08-14 01:11:31,430 INFO  [STDOUT] [ERROR] 2012-08-14 01:11:31           LP::ThisController - #9['?rp`fY: PAYKEY from session is blank or missing on item page
s1-sn902:2012-08-14 00:15:16,746 INFO  [STDOUT] [ERROR] 2012-08-14 00:15:16           LP::ThisController - #Xq5Bez;vF: Attempting to purchase item that is expired
s1-sn701:2012-08-13 23:55:22,969 INFO  [STDOUT] [ERROR] 2012-08-13 23:55:22           LP::OfferController - #\)F3XjY_v: PAYKEY is blank or missing on item page
s1-sn701:2012-08-13 23:29:31,458 INFO  [STDOUT] [ERROR] 2012-08-13 23:29:31           LP::ThisController - #z|gXWQY1S: EAL: ASYNC: in async payment could not create items, api returned 320
s1-sn902:2012-08-13 12:40:13,350 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): Failed to get [1]https://aurl.url.com/v1/85/pp/accounting/  [2]betsy@betsyklein.com/
s1-sn902:2012-08-13 12:40:13,349 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): ["classpath:/META-INF/jruby.home/lib/ruby/1.8/uri/common.rb:436:in `split'"
s1-sn902:2012-08-13 12:40:13,347 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): -----------------------------
s1-sn902:2012-08-13 12:40:13,346 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): bad URI(is not URI?): [3]https://aurl.url.com/85/bills/pp/accounting/  [4]uname@aurl.com/
s1-sn902:2012-08-13 12:40:13,346 INFO  [STDOUT] [ERROR] 2012-08-13 12:40:13           UI::Rails - #ErS;=x*'): Oops, an error occured!

Example of data I want to extract as the error field:

EAL: ASYNC: in async
Error decoding or st
API returned 952 err
PAYKEY from payment 
PAYKEY from session 
Attempting to purcha
PAYKEY is blank or m
ASYNC: in async pay
Failed to get [1]htt
["classpath:/META-IN
--------------------
bad URI(is not URI?)
Oops, an error occur
Tags (1)
0 Karma
1 Solution

kristian_kolb
Ultra Champion

Hmm, that was a bit hard to read... 🙂

First do you NEED the full error messsage, otherwise you can just alter the rex to just capture up to 20 characters;

rex "\#[\w0-9\W]{9}:\s(?P<ERROR>[^\\*]{1,20})"

Also, you could probably make it a bit easier on the eye like this;

rex "\#.{9}:\s(?P<ERROR>.{20})"

if the messages themselves are always more than 20 chars long.

Hope this helps,

Kristian

View solution in original post

kristian_kolb
Ultra Champion

Hmm, that was a bit hard to read... 🙂

First do you NEED the full error messsage, otherwise you can just alter the rex to just capture up to 20 characters;

rex "\#[\w0-9\W]{9}:\s(?P<ERROR>[^\\*]{1,20})"

Also, you could probably make it a bit easier on the eye like this;

rex "\#.{9}:\s(?P<ERROR>.{20})"

if the messages themselves are always more than 20 chars long.

Hope this helps,

Kristian

kristian_kolb
Ultra Champion

Well, you used that construct in the beginning - the {9} 🙂
As you said EXACTLY 20 characters it's probably more correct to use {20} instead of {1,20} - but that's your decision.

/k

0 Karma

hharvey
Explorer

Thanks Kristian! adding {1,20} did it, I just didn't realize that was an option in regex.

i agree, my post was pretty to read through. sorry!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...