I have a dynamic set of result data which I'd like to extract when the beginning of a line is the same across multiple values
For instance based on this data:
FooBarBla
FooBar
FooBar_Brr
I'd like to end up with: FooBar
Based on this set of data:
foo_bar_brr
foo_bar_grr
foo_bar_gr
I'd like to end up with: foo_bar
The challenge I face is data is different all the time and it depends on a host input so I need do some sort of comparison between the lines and the extract the matching bit at the beginning of it. Any ideas how can I achieve this, if at all possible?
UPDATED:
| makeresults
| eval sample="Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry"
| makemv delim="#" sample
| table sample
| mvexpand sample
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_")
| eval sample_header = mvindex(split(sample_mod,"_"),0)
| eventstats count by sample_header
I think that what was presented is covered.
I use eventstats
for clarity.
Please change to stats
as appropriate.
UPDATED:
| makeresults
| eval sample="Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry"
| makemv delim="#" sample
| table sample
| mvexpand sample
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_")
| eval sample_header = mvindex(split(sample_mod,"_"),0)
| eventstats count by sample_header
I think that what was presented is covered.
I use eventstats
for clarity.
Please change to stats
as appropriate.
sorry for the lack of response on this.thank you. that works great.
I have a slight deviation from the standard dataset where the sample data would be like:
"Application_daily#Application_nighly#Application_test#Android_daily#Android,nighlty#Android.test#ApplicationData#Application-Field#Application Registry#AUS_ApplicationRegistry#SysTest_Application#EDI_Application#"
and I'd like sample_header to equal to Application from sample values like:#AUS_ApplicationRegistry#SysTest_Application#EDI_Application#
I guess the best way to achieve this would be a conditional on the drop-down Jenkins host results.
....
| eval sample_mod = replace(sample,"(^..*?)([_\-\., ]|(?=[A-Z]))","\1_")
| rex field=sample "(?<Application>Application)
| eval sample_header = mvindex(split(sample_mod,"_"),0)
| eval sample_header=coalesce(Application,sample_header)
| eventstats count by sample_header
In fact, the sample may be different.
thank you. having checked a few masters the common ones I can see are: "_", "-", ".", " "(whitespace)
So, I think I can use multiple evals in conjunction with mvindex to achieve this.
There are a few exceptions where there is no delimiter at all but only data like:
ApplicationData
ApplicationField
ApplicationRegistry
or duplicate values like:
ApplicationDataApplicationData
ApplicationFieldApplicationField
but I'll think I'll have to deal with those on a case by case basis.
Your suggestion definitely points me in the right direction though.
i'm little confused by your example. a bit more realistic event sample/example would be good
What do you mean by extract? From your example, you want FooBar and foo_bar extracted as a value?
Are you able to provide an actual data sample?
Let's assume I have this table:
<dashboard>
<label>dedup/extract</label>
<row>
<panel>
<table>
<search>
<query>| makeresults
| eval sample="FooBarBla,FooBar,FooBar_Brr,foo_bar_brr,foo_bar_grr,foo_bar_gr"
| makemv delim="," sample | table sample
</query>
<earliest>-24h@h</earliest>
<latest>now</latest>
</search>
<option name="count">10</option>
</table>
</panel>
</row>
</dashboard>
How I can get FooBar and foo_bar extracted as values but without specifying FooBar and foo_bar in any regex? The match needs to be done based on the match at the beginning of each line.
You just have to find the right regex and that's why we'd like to see some real data. But for the example you're giving I can easily extract the two different terms with one regex as follows:
| makeresults
| eval sample="FooBarBla,FooBar,FooBar_Brr,foo_bar_brr,foo_bar_grr,foo_bar_gr"
| makemv delim="," sample | table sample
| rex field=sample "(?[f|F]oo\_?[b|B]ar)"
thank you. the use case I have is querying several Jenkins masters for their job_names.
Some job_names start with let's say "Android" some with say "Application". So a search on a Jenkins master would retrieve:
Application_daily
Application_nighly
Application_test
Android_daily
Android_nighlty
Android_test
My aim when queering the list above is to end up with 2 values:
Application
Android
I don't want to use specific strings like Android and Application in the regex but rely on matching the beginning of each line instead. Hope this makes sense.
Like this? If there's ALWAYS an underscore after your value, that'll make it super easy to identify what you want to extract.
| makeresults
| eval sample="Application_daily,Application_nighly,Application_test,Android_daily,Android_nighlty,Android_test"
| makemv delim="," sample | table sample
| rex field=sample "^(?<your_value>\w+)_.*"
Problem is don't know the value beforehand.
That doesn't matter. But there has to be SOME kind of a pattern to tell regex when to stop capturing. You need to define for the regex what it needs to look for.
There is no specific word or value I can set unfortunately. The only pattern is searching thousands of values, starting with different and same characters. If beginning of the line is the same on multiple lines, stop when there is no longer a match and extract the value. Then carry on searching and do the same for the rest of the values.