Solved: How to configure MV_ADD in the search language?

melonman · ‎07-24-2012

Hi

I am trying to figure out how to count 'abc' string in the following string field.

2012/07/21 16:18:30 string=bbacbacbaabbacbaabbccaacbacbaabbacbacbaabcccbaabccaacbabca
2012/07/21 16:18:30 string=abccacbabcbbcbacbbbbccbaabcccaacbacbaccbacbacbacbaccbacbac
2012/07/21 16:18:29 string=ccaaabbccaacbacbaccaacbaccaabbccbacabccbacbacbbccaabbbccaa

I could configure MV_ADD in transforms.conf to get multiple values for a same field in a event.
However, I still can not figure out how to set up MV_ADD for string field.
I want to do it without configuring in the transforms.conf if it is possible to do this only by search query.

Does anyone know how to do this?

Thank you!

lguinn2 · ‎07-24-2012

I have a weird idea. It won't work for all strings, but it should work for strings that end in a letter that appears only once in the string. Since 'c' only appears once, at the end, this might work

yoursearchhere
| eval string2=string + "x"
| eval mysubstrings=split(string2,"c")
| eval mymatches=mvfilter(match(mysubstrings, "ab"))
| eval mymatches=mvfilter(mymatches!=NULL)
| eval numOccurences=mvcount(mymatches)
| fields - mysubstrings mymatches string2

What it does:

1. adds an x at the end of the string, to prevent an "ab" at the end of the string from causing an erroneous match.

2. splits the strings into individual pieces, using the 'c' as the delimiter

3. eliminates any pieces that do not match the "ab" pattern, and also eliminates any null strings

4. counts the number of substrings remaining

Oh, and you don't need MV_ADD for this to work.

View solution in original post

Jason · ‎03-08-2013

How about sourcetype=wordcount | dedup string | rex field=string max_match=10000 "(?<abc>abc)" | eval abc=mvcount(abc) | table abc - this does the count of abc in the string (since abc does not contain itself, it is an easy calculation)

if you're looking to calculate every count of every word, that gets more interesting, but we can play with regex captures and multivalued fields to get there. Things to remember are:

Running a regex multiple times will make it pick up where it left off, so for the word SPLUNK, | rex max_matches=10000 "(?<char>..)" will result in "SP LU NK" instead of "SP PL LU UN NK" that you might want.
You can use a trick of capturing data in a positive lookahead which doesn't consume any characters
If you don't have any characters outside the zero-width assertion, Splunk won't advance in the regex and will pull up to max_matches of... the beginning of the string.
It appears you can't capture both the character and the positive lookahead following it in the same field

So, I ended up with a search that captures a character, then the x characters after it into two separate fields, then stitches them together afterwards:

| stats count | fields - count | eval string="SPLUNK" | rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{2}))" | eval chars=mvzip(char1, char2) | fields - char1 char2 string | mvexpand chars | eval chars=replace(chars, ",", "")

The {2} are the characters after the first one, so that means that you will get words of length 3. The above produces

SPL
PLU
LUN
UNK

Repeating it a few times for different length words (make sure to do 1-character last, since the mvzip seems a bit particular) gives us the awesome:

| stats count | fields - count | eval string="bbacbacbaabbacbaabbccaacbacbaabbacbacbaabcccbaabccaacbabca" 
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{4}))" 
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{3}))" 
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{2}))" 
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{1}))" 
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>))" 
| eval chars=mvzip(char1, char2) | fields - char1 char2 string 
| mvexpand chars | eval chars=replace(chars, ",", "") | stats count by chars | sort - count

    chars   count
1   a       22
2   b       19
3   c       16
4   ba      12
5   cb      9
6   cba     9
7   ac      8
8   acb     8
9   acba    8
10  aa      7
...

Jason · ‎03-08-2013

Just updated it to polish and leave just the basic and extended answer. Enjoy!

melonman · ‎03-08-2013

Oh! This is great post, Thank you!

lguinn2 · ‎07-24-2012

I have a weird idea. It won't work for all strings, but it should work for strings that end in a letter that appears only once in the string. Since 'c' only appears once, at the end, this might work

yoursearchhere
| eval string2=string + "x"
| eval mysubstrings=split(string2,"c")
| eval mymatches=mvfilter(match(mysubstrings, "ab"))
| eval mymatches=mvfilter(mymatches!=NULL)
| eval numOccurences=mvcount(mymatches)
| fields - mysubstrings mymatches string2

What it does:

1. adds an x at the end of the string, to prevent an "ab" at the end of the string from causing an erroneous match.

2. splits the strings into individual pieces, using the 'c' as the delimiter

3. eliminates any pieces that do not match the "ab" pattern, and also eliminates any null strings

4. counts the number of substrings remaining

Oh, and you don't need MV_ADD for this to work.

melonman · ‎08-09-2012

To share with community, here is something close but actually not working search, but this can do similar job..

sourcetype=wordcount | dedup string | rex field=string max_match=10000 "(?...)" | chart count by string chars

melonman · ‎07-24-2012

Iguinn,

Thanks for your post!
This would work particually for a search 'abc'.
What I am trying to do is also
to count how many 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc' etc.

The reason I need this is my language does not have space between words, and sometimes it is required to get how many times specific characters are found in a description field(string field this time) in an event.

I really appreciate your comments!

How to configure MV_ADD in the search language?

Message Parsing in SOCK

Exploring the OpenTelemetry Collector’s Kubernetes annotation-based discovery

Use ‘em or lose ‘em | Splunk training units do expire