Splunk Search

Transforms, Regex and wrong source names OH MY!

gnovak
Builder

NO this is no April Fools Joke. But it feels that way to me...

I'm trying to use transforms.conf and props.conf to change the way the source of a log file looks in Splunk when it is indexed.

However sometimes the log file is long, other times it is not. For example:

arcd_3278659_me00quc_cat-qu-mouse01.20140331
arcd_3268459_me04quc_rat-qu-mouse04.20140331
arcd_me00quc_cat-qu-mouse.20140331
arcd_me02quc_bat-qf-mouse.20140331
arcd_100_me00quc_cat-qu-mouse01.20140331

Normally there are digits between the words arcd and me00que...but in some cases, there is not! I want the source name to show up as:

arcd_me00quc_cat-qu-mouse
arcd_me04quc_rat-qu-mouse04
arcd_me02quc_bat-qf-mouse

I don't care about the trailing numbers at the end and I don't care about the numbers between arcd and me00que. This is where I"m having a problem renaming the source.

Transforms.conf

[source_clean-digits-before-ext]
DEST_KEY   = MetaData:Source
SOURCE_KEY = MetaData:Source
REGEX = source::(.*)[-._]\d+[-._]([a-z0-9]+)[-._]([a-zA-Z0-9-]+)[-._]\d+$
FORMAT   = source::$1$3$4

Props.conf:

[arcd]
TRANSFORMS-fix_source = source_clean-digits-before-ext

When I turned this on, the source names showed up as:

arcdme00quc-batch01$4
arcd_me00quc_cat-qu-mouse01.20140401

I think this is messing up the regex. I'm wondering if there is a way to have the source name always show up as:

 arcd_me00quc_cat-qu-mouse01

I don't want the training numbers at the end and if there are numbers in $2, I don't want those either! I'm playing around with this but thought hey, I'll post it here too because I'm having trouble trying to nail it down.

Tags (2)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

For stripping possibly-existing 3+ digit numbers plus an underscore and 8 digit numbers plus a leading period from the source, you can adapt the stanza from this answer by making the REGEX more lenient:

REGEX = source::(.*?_)(?:\d{3,}_)?(.*?)(?:\.\d{8})?$
FORMAT = source::$1$2

alt text

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

For stripping possibly-existing 3+ digit numbers plus an underscore and 8 digit numbers plus a leading period from the source, you can adapt the stanza from this answer by making the REGEX more lenient:

REGEX = source::(.*?_)(?:\d{3,}_)?(.*?)(?:\.\d{8})?$
FORMAT = source::$1$2

alt text

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

The FORMAT is source::$1$2 rather than source::$1$3 because the two optional numbers-grabbing groups are non-capturing, as marked by the (?: opening sequence.

Converted 🙂

I wrote the regex into Splunk Answers directly... but I did indeed use regexr.com to make the screenshot.

0 Karma

gnovak
Builder

p.p.s. Did you use http://www.regexr.com/ for this? I will try it...seems very nice tool! Was not aware of that one. Thanks a bunch.

0 Karma

gnovak
Builder

p.s. You might want to post this as an answer so I can mark it as answered and you get credit for it!

0 Karma

gnovak
Builder

This actually worked out nicely.

0 Karma

tpederson
Path Finder

This might be overly simple, but how about this?

[source_clean-digits-before-ext]
DEST_KEY   = MetaData:Source
SOURCE_KEY = MetaData:Source
REGEX = source::(arcd_).*(me00quc_).*(cat-qu-mouse)
FORMAT   = source::$1$2$3

Of course you could simply always override the name, but this way it will only change the source name of something that contains those strings.

0 Karma

gnovak
Builder

tried something similar to this but it didn't work. I'm still trying different stuff here but the issue is having splunk know "hey there are numbers or aren't numbers" before the trailing date at the end...and for splunk to know "hey, drop the date at the end too". I also edited my question because the names can be different at times.

0 Karma