Splunk Search

How can I build regex for specific field extraction?

Path Finder

Hello everyone,

I am sure this is a relatively easy regex to build but I was hoping for some assistance, my regex experience is still pretty rocky 🙂

One of my log values contains more information than I need, but the first "section" of the value is what I really want to pull out. I can say that 100% of the time the value I want to extract is followed by today's date along with some more information. For instance:

Field=value02132018additionaltext

In the above example, it would only be the "value" that I care about and want to strip everything after it. Is this possible?

Thanks

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi @bcarr12,

Can you please try regex with sed

Run anywhere search

| makeresults
| eval field1="value02132018additionaltext"
| rex field=field1 mode=sed "s/([a-z]*)\d+\w+/\1/g"

So your query will be

<yourBasesearch> | rex field=<FIELDNAME> mode=sed "s/([a-z]*)\d+\w+/\1/g"

View solution in original post

0 Karma

Super Champion

Try this run anywhere search:

|makeresults|eval t="value02132018additionaltext"
|rex field=t "(?<a>[^\d]+)" 
0 Karma

Ultra Champion

Can you provide some more clarity on what kind of data is in the 'value' part? Otherwise it will be quite difficult to help you come up with a regex that is able to distinguish between the value part and the date and text.

0 Karma

Path Finder

The regex provided a little further down in this question gets me very close:
rex field=myfield mode=sed "s/([a-z]*)\d+\w+/\1/g"

This command gets me the value I want 99% of the time and in a few cases a little bit more. The data in the "value" is typically 6-8 characters long (not always 8 characters long but at most will not be more than 8 characters long), and alphanumeric. It is then followed by today's date and some additional alphanumeric values after that.

0 Karma

SplunkTrust
SplunkTrust

Hi @bcarr12,

Can you please try regex with sed

Run anywhere search

| makeresults
| eval field1="value02132018additionaltext"
| rex field=field1 mode=sed "s/([a-z]*)\d+\w+/\1/g"

So your query will be

<yourBasesearch> | rex field=<FIELDNAME> mode=sed "s/([a-z]*)\d+\w+/\1/g"

View solution in original post

0 Karma

Path Finder

Thank you! This gets me very close to where I need to be. I think if a condition is added to this to recognize that the value "ends" with a comma it will work properly. Right now the extraction works correctly 99% of the time but in some cases also extracts some extra info at the end of the complete value. So the field/value pair I am extracting is:

field=valuemmddyyyyadditionaltext, nextfield=nextvalue

It pulls the value out correctly almost every time, but is including some additional characters from the "additionaltext" part in a handful of cases. So I think if the regex could basically be set to ignore everything in the value beginning with date code and ending with a comma, it will be exactly what I need (just the initial value and nothing else).

0 Karma

SplunkTrust
SplunkTrust

Try this <yourBasesearch> | rex field=<FIELDNAME> mode=sed "s/([a-z]*)\d+\w+\,/\1/g"

0 Karma

Path Finder

Hmm this causes the extraction to pull out more than is needed. Thanks for your help, you've put me on the right path so I can work on this a bit more to fine tune. Thanks again!

0 Karma

Path Finder

Looks like this did the trick, updated w+ to S+

rex field= mode=sed "s/([a-z]*)\d+\S+/\1/g"

0 Karma

Legend

Hi bcarr12,
if you're sure that in the value field there never are eight successive numbers, you can use:

| rex field=Field "(?<my_field>.*)\d{8}"

as you can test at https://regex101.com/r/glpagP/1

Bye.
Giuseppe

0 Karma