Solved: How can I build regex for specific field extractio...

bcarr12 · ‎02-13-2018

Hello everyone,

I am sure this is a relatively easy regex to build but I was hoping for some assistance, my regex experience is still pretty rocky 🙂

One of my log values contains more information than I need, but the first "section" of the value is what I really want to pull out. I can say that 100% of the time the value I want to extract is followed by today's date along with some more information. For instance:

Field=value02132018additionaltext

In the above example, it would only be the "value" that I care about and want to strip everything after it. Is this possible?

Thanks

harsmarvania57 · ‎02-13-2018

Hi @bcarr12,

Can you please try regex with sed

Run anywhere search

| makeresults
| eval field1="value02132018additionaltext"
| rex field=field1 mode=sed "s/([a-z]*)\d+\w+/\1/g"

So your query will be

<yourBasesearch> | rex field=<FIELDNAME> mode=sed "s/([a-z]*)\d+\w+/\1/g"

View solution in original post

493669 · ‎02-13-2018

Try this run anywhere search:

|makeresults|eval t="value02132018additionaltext"
|rex field=t "(?<a>[^\d]+)"

FrankVl · ‎02-13-2018

Can you provide some more clarity on what kind of data is in the 'value' part? Otherwise it will be quite difficult to help you come up with a regex that is able to distinguish between the value part and the date and text.

bcarr12 · ‎02-13-2018

The regex provided a little further down in this question gets me very close:
rex field=myfield mode=sed "s/([a-z]*)\d+\w+/\1/g"

This command gets me the value I want 99% of the time and in a few cases a little bit more. The data in the "value" is typically 6-8 characters long (not always 8 characters long but at most will not be more than 8 characters long), and alphanumeric. It is then followed by today's date and some additional alphanumeric values after that.

harsmarvania57 · ‎02-13-2018

Hi @bcarr12,

Can you please try regex with sed

Run anywhere search

| makeresults
| eval field1="value02132018additionaltext"
| rex field=field1 mode=sed "s/([a-z]*)\d+\w+/\1/g"

So your query will be

<yourBasesearch> | rex field=<FIELDNAME> mode=sed "s/([a-z]*)\d+\w+/\1/g"

bcarr12 · ‎02-13-2018

Thank you! This gets me very close to where I need to be. I think if a condition is added to this to recognize that the value "ends" with a comma it will work properly. Right now the extraction works correctly 99% of the time but in some cases also extracts some extra info at the end of the complete value. So the field/value pair I am extracting is:

field=valuemmddyyyyadditionaltext, nextfield=nextvalue

It pulls the value out correctly almost every time, but is including some additional characters from the "additionaltext" part in a handful of cases. So I think if the regex could basically be set to ignore everything in the value beginning with date code and ending with a comma, it will be exactly what I need (just the initial value and nothing else).

harsmarvania57 · ‎02-13-2018

Try this <yourBasesearch> | rex field=<FIELDNAME> mode=sed "s/([a-z]*)\d+\w+\,/\1/g"

bcarr12 · ‎02-13-2018

Hmm this causes the extraction to pull out more than is needed. Thanks for your help, you've put me on the right path so I can work on this a bit more to fine tune. Thanks again!

bcarr12 · ‎02-13-2018

Looks like this did the trick, updated w+ to S+

rex field= mode=sed "s/([a-z]*)\d+\S+/\1/g"

gcusello · ‎02-13-2018

Hi bcarr12,
if you're sure that in the value field there never are eight successive numbers, you can use:

| rex field=Field "(?<my_field>.*)\d{8}"

as you can test at https://regex101.com/r/glpagP/1

Bye.
Giuseppe

How can I build regex for specific field extraction?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

How can I build regex for specific field extraction?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits