Splunk Search

How to edit my regular expression for a multivalue field extraction with new lines?

johnmvang
Path Finder

Hello,

I need REGEX help. I've wasted almost all day trying to do this and only came up with this which is very sloppy. I feel like this could be more efficient and work. When i plug it into Splunk it doesn't do anything in the field extractor "i'll define my own regular expression' section.

My Regex:

^Job Dependencies:\s*[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*|,\s+[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*,\n|\G\s*[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*,*

I only need the Job dependencies. I know i need to turn them into a multi value field so the expected splunk stats list output can look like this:

Job Name                             Job Dependencies
ABC_Job                              ABC_ABC_AB2_123_ABC123
                                     ABC_ABC_AB2_123_123ABC
                                     BCA_BCA_12A_ABC_123ABC
                                     DDD_AAA_CCC_12_123ABC

(I dont need help with the splunk search, just showing so you guys know what i'm trying to achieve.)

Since the Data also has a "Job Prerequisites:" section which have similarly formated data, my regex would capture this data as well, but i don't want it.

Please help. Sample data below:

Job Name :          Job ID:
ABC_Job              ADF123

Job Prerequisites: (ABC_ABC_AB2_123_ABC123, AB1_ABC_AB2_123_123ABC)

Job Dependencies: (ABC_ABC_AB2_123_ABC123, ABC_ABC_AB2_123_123ABC,
                  BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)

THERES A CATCH Sometimes the "Job Dependencies" could have square brackets OR just one dependency for example:

Job Dependencies: (ABC_ABC_AB2_123_ABC123, [ABC_ABC_AB2_123_123ABC],
                  BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)

OR

Job Dependencies: (DDD_AAA_CCC_12_123ABC)

Pretty much, i am trying to find the data with under scores (_) after Job Dependencies. I can't get my regex to wrap or work correctly.

Any help is greatly Appreciated.

Thanks,

John

0 Karma
1 Solution

gokadroid
Motivator

Ignoring all the pieces as required and focusing just on the troubling multivalued Job Dependencies here is what you can try to see if it works out for you.

Assuming one event has only one line of Job Dependencies: which is a multivalued field, how about trying to first rex out the multivalue field in a single field jd and then split it into multiple values in multiJD. Thereafter mvexpand shall give all the values:

your query to filter the events
| rex "your rex to get the job name"
| rex field=_raw "Job Dependencies:\s*\((?<jd>[^\)]+)"
| eval multiJD=split(jd, ",")
| mvexpand multiJD

View solution in original post

0 Karma

woodcock
Esteemed Legend

Try this; it will create a multivalued field:

... | rex max_match=4 "(?ms)(?<Job_Dependency>[^\(\),\[\]\s]+)"
0 Karma

DalJeanis
Legend

To expand on woodcock's code - here's a way to generate test data, and then a sample of his results and a slightly more complicated Rex that you can modify as you like to eliminate any text before the dependencies.

| makeresults
| eval MyDeps = mvappend(
 "Job Dependencies: (ABC_ABC_AB2_123_ABC123, [ABC_ABC_AB2_123_123ABC], BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)",
 "Job Dependencies: ([ABC_ABC_AB2_123_123ABC], BCA_BCA_12A_ABC_123ABC, [DDD_AAA_CCC_12_123ABC])",
 "Job Dependencies: (DDD_AAA_CCC_12_123ABC)",
 "Job Dependencies: ([DDD_AAA_CCC_12_123ABC])",
 "Job Dependencies: (ABC_ABC_AB2_123_ABC123, ABC_ABC_AB2_123_123ABC, BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)")
| mvexpand MyDeps
| rename MyDeps as _raw

everything above this point just makes some test data.

| rex max_match=10 "(?ms)(?<Job_Dep_Rex1>[^\(\),\[\]\s]+)"
| rex max_match=10 "(?ms)((?:Job Dependencies: )|(?<Job_Dep_Rex2>[^\(\),\[\]\s]+))"
0 Karma

gokadroid
Motivator

Ignoring all the pieces as required and focusing just on the troubling multivalued Job Dependencies here is what you can try to see if it works out for you.

Assuming one event has only one line of Job Dependencies: which is a multivalued field, how about trying to first rex out the multivalue field in a single field jd and then split it into multiple values in multiJD. Thereafter mvexpand shall give all the values:

your query to filter the events
| rex "your rex to get the job name"
| rex field=_raw "Job Dependencies:\s*\((?<jd>[^\)]+)"
| eval multiJD=split(jd, ",")
| mvexpand multiJD
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...