Hello,
I need REGEX help. I've wasted almost all day trying to do this and only came up with this which is very sloppy. I feel like this could be more efficient and work. When i plug it into Splunk it doesn't do anything in the field extractor "i'll define my own regular expression' section.
My Regex:
^Job Dependencies:\s*[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*|,\s+[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*,\n|\G\s*[([]*(\w+_\w+_\w+_\w+_\w+)[)\]]*,*
I only need the Job dependencies. I know i need to turn them into a multi value field so the expected splunk stats list output can look like this:
Job Name Job Dependencies
ABC_Job ABC_ABC_AB2_123_ABC123
ABC_ABC_AB2_123_123ABC
BCA_BCA_12A_ABC_123ABC
DDD_AAA_CCC_12_123ABC
(I dont need help with the splunk search, just showing so you guys know what i'm trying to achieve.)
Since the Data also has a "Job Prerequisites:" section which have similarly formated data, my regex would capture this data as well, but i don't want it.
Please help. Sample data below:
Job Name : Job ID:
ABC_Job ADF123
Job Prerequisites: (ABC_ABC_AB2_123_ABC123, AB1_ABC_AB2_123_123ABC)
Job Dependencies: (ABC_ABC_AB2_123_ABC123, ABC_ABC_AB2_123_123ABC,
BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)
THERES A CATCH Sometimes the "Job Dependencies" could have square brackets OR just one dependency for example:
Job Dependencies: (ABC_ABC_AB2_123_ABC123, [ABC_ABC_AB2_123_123ABC],
BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)
OR
Job Dependencies: (DDD_AAA_CCC_12_123ABC)
Pretty much, i am trying to find the data with under scores (_) after Job Dependencies. I can't get my regex to wrap or work correctly.
Any help is greatly Appreciated.
Thanks,
John
Ignoring all the pieces as required and focusing just on the troubling multivalued Job Dependencies
here is what you can try to see if it works out for you.
Assuming one event has only one line of Job Dependencies:
which is a multivalued field, how about trying to first rex
out the multivalue field
in a single field jd
and then split
it into multiple values in multiJD
. Thereafter mvexpand
shall give all the values:
your query to filter the events
| rex "your rex to get the job name"
| rex field=_raw "Job Dependencies:\s*\((?<jd>[^\)]+)"
| eval multiJD=split(jd, ",")
| mvexpand multiJD
Try this; it will create a multivalued field:
... | rex max_match=4 "(?ms)(?<Job_Dependency>[^\(\),\[\]\s]+)"
To expand on woodcock's code - here's a way to generate test data, and then a sample of his results and a slightly more complicated Rex that you can modify as you like to eliminate any text before the dependencies.
| makeresults
| eval MyDeps = mvappend(
"Job Dependencies: (ABC_ABC_AB2_123_ABC123, [ABC_ABC_AB2_123_123ABC], BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)",
"Job Dependencies: ([ABC_ABC_AB2_123_123ABC], BCA_BCA_12A_ABC_123ABC, [DDD_AAA_CCC_12_123ABC])",
"Job Dependencies: (DDD_AAA_CCC_12_123ABC)",
"Job Dependencies: ([DDD_AAA_CCC_12_123ABC])",
"Job Dependencies: (ABC_ABC_AB2_123_ABC123, ABC_ABC_AB2_123_123ABC, BCA_BCA_12A_ABC_123ABC, DDD_AAA_CCC_12_123ABC)")
| mvexpand MyDeps
| rename MyDeps as _raw
everything above this point just makes some test data.
| rex max_match=10 "(?ms)(?<Job_Dep_Rex1>[^\(\),\[\]\s]+)"
| rex max_match=10 "(?ms)((?:Job Dependencies: )|(?<Job_Dep_Rex2>[^\(\),\[\]\s]+))"
Ignoring all the pieces as required and focusing just on the troubling multivalued Job Dependencies
here is what you can try to see if it works out for you.
Assuming one event has only one line of Job Dependencies:
which is a multivalued field, how about trying to first rex
out the multivalue field
in a single field jd
and then split
it into multiple values in multiJD
. Thereafter mvexpand
shall give all the values:
your query to filter the events
| rex "your rex to get the job name"
| rex field=_raw "Job Dependencies:\s*\((?<jd>[^\)]+)"
| eval multiJD=split(jd, ",")
| mvexpand multiJD