Solved: Re: Extract from dynamic values

neluvasilica · ‎02-14-2020

I have a dynamic set of result data which I'd like to extract when the beginning of a line is the same across multiple values
For instance based on this data:

FooBarBla
FooBar
FooBar_Brr

I'd like to end up with: FooBar
Based on this set of data:

foo_bar_brr
foo_bar_grr
foo_bar_gr

I'd like to end up with: foo_bar

The challenge I face is data is different all the time and it depends on a host input so I need do some sort of comparison between the lines and the extract the matching bit at the beginning of it. Any ideas how can I achieve this, if at all possible?

to4kawa · ‎02-17-2020

UPDATED:

| makeresults 
| eval sample="Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry" 
| makemv delim="#" sample 
| table sample 
| mvexpand sample 
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_") 
| eval sample_header = mvindex(split(sample_mod,"_"),0) 
| eventstats count by sample_header

I think that what was presented is covered.
I use eventstats for clarity.
Please change to stats as appropriate.

View solution in original post

to4kawa · ‎02-17-2020

UPDATED:

| makeresults 
| eval sample="Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry" 
| makemv delim="#" sample 
| table sample 
| mvexpand sample 
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_") 
| eval sample_header = mvindex(split(sample_mod,"_"),0) 
| eventstats count by sample_header

I think that what was presented is covered.
I use eventstats for clarity.
Please change to stats as appropriate.

neluvasilica · ‎02-20-2020

sorry for the lack of response on this.thank you. that works great.
I have a slight deviation from the standard dataset where the sample data would be like:
"Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry#AUS_ApplicationRegistry#SysTest_Application#EDI_Application#"
and I'd like sample_header to equal to Application from sample values like:#AUS_ApplicationRegistry#SysTest_Application#EDI_Application#

I guess the best way to achieve this would be a conditional on the drop-down Jenkins host results.

to4kawa · ‎02-20-2020

....
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_") 
| rex field=sample "(?<Application>Application)
| eval sample_header = mvindex(split(sample_mod,"_"),0) 
| eval sample_header=coalesce(Application,sample_header)
| eventstats count by sample_header

In fact, the sample may be different.

neluvasilica · ‎02-17-2020

thank you. having checked a few masters the common ones I can see are: "_", "-", ".", " "(whitespace)
So, I think I can use multiple evals in conjunction with mvindex to achieve this.
There are a few exceptions where there is no delimiter at all but only data like:
ApplicationData
ApplicationField
ApplicationRegistry

or duplicate values like:
ApplicationDataApplicationData
ApplicationFieldApplicationField
but I'll think I'll have to deal with those on a case by case basis.
Your suggestion definitely points me in the right direction though.

koshyk · ‎02-14-2020

i'm little confused by your example. a bit more realistic event sample/example would be good

oscar84x · ‎02-14-2020

What do you mean by extract? From your example, you want FooBar and foo_bar extracted as a value?

Are you able to provide an actual data sample?

neluvasilica · ‎02-14-2020

Let's assume I have this table:

<dashboard>
   <label>dedup/extract</label>
   <row>
     <panel>
       <table>
         <search>
           <query>| makeresults 
 | eval sample="FooBarBla,FooBar,FooBar_Brr,foo_bar_brr,foo_bar_grr,foo_bar_gr" 
 | makemv delim="," sample | table sample
 </query>
           <earliest>-24h@h</earliest>
           <latest>now</latest>
         </search>
         <option name="count">10</option>
         </table>
     </panel>
   </row>
    </dashboard>

How I can get FooBar and foo_bar extracted as values but without specifying FooBar and foo_bar in any regex? The match needs to be done based on the match at the beginning of each line.

oscar84x · ‎02-14-2020

You just have to find the right regex and that's why we'd like to see some real data. But for the example you're giving I can easily extract the two different terms with one regex as follows:

 | makeresults 
   | eval sample="FooBarBla,FooBar,FooBar_Brr,foo_bar_brr,foo_bar_grr,foo_bar_gr" 
   | makemv delim="," sample | table sample
   | rex field=sample "(?[f|F]oo\_?[b|B]ar)"

neluvasilica · ‎02-14-2020

thank you. the use case I have is querying several Jenkins masters for their job_names.
Some job_names start with let's say "Android" some with say "Application". So a search on a Jenkins master would retrieve:

Application_daily
Application_nighly
Application_test
Android_daily
Android_nighlty
Android_test

My aim when queering the list above is to end up with 2 values:

Application
Android

I don't want to use specific strings like Android and Application in the regex but rely on matching the beginning of each line instead. Hope this makes sense.

oscar84x · ‎02-14-2020

Like this? If there's ALWAYS an underscore after your value, that'll make it super easy to identify what you want to extract.

| makeresults 
  | eval sample="Application_daily,Application_nighly,Application_test,Android_daily,Android_nighlty,Android_test" 
  | makemv delim="," sample | table sample
  | rex field=sample "^(?<your_value>\w+)_.*"

neluvasilica · ‎02-14-2020

Problem is don't know the value beforehand.

oscar84x · ‎02-14-2020

That doesn't matter. But there has to be SOME kind of a pattern to tell regex when to stop capturing. You need to define for the regex what it needs to look for.

neluvasilica · ‎02-17-2020

There is no specific word or value I can set unfortunately. The only pattern is searching thousands of values, starting with different and same characters. If beginning of the line is the same on multiple lines, stop when there is no longer a match and extract the value. Then carry on searching and do the same for the rest of the values.

Extract from dynamic values

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life