Splunk Search

Extract from dynamic values

neluvasilica
Explorer

I have a dynamic set of result data which I'd like to extract when the beginning of a line is the same across multiple values
For instance based on this data:

FooBarBla
FooBar
FooBar_Brr

I'd like to end up with: FooBar
Based on this set of data:

foo_bar_brr
foo_bar_grr
foo_bar_gr

I'd like to end up with: foo_bar

The challenge I face is data is different all the time and it depends on a host input so I need do some sort of comparison between the lines and the extract the matching bit at the beginning of it. Any ideas how can I achieve this, if at all possible?

0 Karma
1 Solution

to4kawa
SplunkTrust
SplunkTrust

UPDATED:

| makeresults 
| eval sample="Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry" 
| makemv delim="#" sample 
| table sample 
| mvexpand sample 
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_") 
| eval sample_header = mvindex(split(sample_mod,"_"),0) 
| eventstats count by sample_header

I think that what was presented is covered.
I use eventstats for clarity.
Please change to stats as appropriate.

View solution in original post

0 Karma

to4kawa
SplunkTrust
SplunkTrust

UPDATED:

| makeresults 
| eval sample="Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry" 
| makemv delim="#" sample 
| table sample 
| mvexpand sample 
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_") 
| eval sample_header = mvindex(split(sample_mod,"_"),0) 
| eventstats count by sample_header

I think that what was presented is covered.
I use eventstats for clarity.
Please change to stats as appropriate.

View solution in original post

0 Karma

neluvasilica
Explorer

sorry for the lack of response on this.thank you. that works great.
I have a slight deviation from the standard dataset where the sample data would be like:
"Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry#AUS_ApplicationRegistry#SysTest_Application#EDI_Application#"
and I'd like sample_header to equal to Application from sample values like:#AUS_ApplicationRegistry#SysTest_Application#EDI_Application#

I guess the best way to achieve this would be a conditional on the drop-down Jenkins host results.

0 Karma

to4kawa
SplunkTrust
SplunkTrust
....
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_") 
| rex field=sample "(?<Application>Application)
| eval sample_header = mvindex(split(sample_mod,"_"),0) 
| eval sample_header=coalesce(Application,sample_header)
| eventstats count by sample_header

In fact, the sample may be different.

0 Karma

neluvasilica
Explorer

thank you. having checked a few masters the common ones I can see are: "_", "-", ".", " "(whitespace)
So, I think I can use multiple evals in conjunction with mvindex to achieve this.
There are a few exceptions where there is no delimiter at all but only data like:
ApplicationData
ApplicationField
ApplicationRegistry

or duplicate values like:
ApplicationDataApplicationData
ApplicationFieldApplicationField
but I'll think I'll have to deal with those on a case by case basis.
Your suggestion definitely points me in the right direction though.

0 Karma

koshyk
Super Champion

i'm little confused by your example. a bit more realistic event sample/example would be good

0 Karma

oscar84x
Contributor

What do you mean by extract? From your example, you want FooBar and foo_bar extracted as a value?

Are you able to provide an actual data sample?

0 Karma

neluvasilica
Explorer

Let's assume I have this table:

<dashboard>
   <label>dedup/extract</label>
   <row>
     <panel>
       <table>
         <search>
           <query>| makeresults 
 | eval sample="FooBarBla,FooBar,FooBar_Brr,foo_bar_brr,foo_bar_grr,foo_bar_gr" 
 | makemv delim="," sample | table sample
 </query>
           <earliest>-24h@h</earliest>
           <latest>now</latest>
         </search>
         <option name="count">10</option>
         </table>
     </panel>
   </row>
    </dashboard>

How I can get FooBar and foo_bar extracted as values but without specifying FooBar and foo_bar in any regex? The match needs to be done based on the match at the beginning of each line.

0 Karma

oscar84x
Contributor

You just have to find the right regex and that's why we'd like to see some real data. But for the example you're giving I can easily extract the two different terms with one regex as follows:

 | makeresults 
   | eval sample="FooBarBla,FooBar,FooBar_Brr,foo_bar_brr,foo_bar_grr,foo_bar_gr" 
   | makemv delim="," sample | table sample
   | rex field=sample "(?[f|F]oo\_?[b|B]ar)"
0 Karma

neluvasilica
Explorer

thank you. the use case I have is querying several Jenkins masters for their job_names.
Some job_names start with let's say "Android" some with say "Application". So a search on a Jenkins master would retrieve:

Application_daily
Application_nighly
Application_test
Android_daily
Android_nighlty
Android_test

My aim when queering the list above is to end up with 2 values:

Application
Android

I don't want to use specific strings like Android and Application in the regex but rely on matching the beginning of each line instead. Hope this makes sense.

0 Karma

oscar84x
Contributor

Like this? If there's ALWAYS an underscore after your value, that'll make it super easy to identify what you want to extract.

| makeresults 
  | eval sample="Application_daily,Application_nighly,Application_test,Android_daily,Android_nighlty,Android_test" 
  | makemv delim="," sample | table sample
  | rex field=sample "^(?<your_value>\w+)_.*"
0 Karma

neluvasilica
Explorer

Problem is don't know the value beforehand.

0 Karma

oscar84x
Contributor

That doesn't matter. But there has to be SOME kind of a pattern to tell regex when to stop capturing. You need to define for the regex what it needs to look for.

0 Karma

neluvasilica
Explorer

There is no specific word or value I can set unfortunately. The only pattern is searching thousands of values, starting with different and same characters. If beginning of the line is the same on multiple lines, stop when there is no longer a match and extract the value. Then carry on searching and do the same for the rest of the values.

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!