Hi
I am trying to figure out how to count 'abc' string in the following string field.
2012/07/21 16:18:30 string=bbacbacbaabbacbaabbccaacbacbaabbacbacbaabcccbaabccaacbabca
2012/07/21 16:18:30 string=abccacbabcbbcbacbbbbccbaabcccaacbacbaccbacbacbacbaccbacbac
2012/07/21 16:18:29 string=ccaaabbccaacbacbaccaacbaccaabbccbacabccbacbacbbccaabbbccaa
I could configure MV_ADD in transforms.conf to get multiple values for a same field in a event.
However, I still can not figure out how to set up MV_ADD for string field.
I want to do it without configuring in the transforms.conf if it is possible to do this only by search query.
Does anyone know how to do this?
Thank you!
I have a weird idea. It won't work for all strings, but it should work for strings that end in a letter that appears only once in the string. Since 'c' only appears once, at the end, this might work
yoursearchhere
| eval string2=string + "x"
| eval mysubstrings=split(string2,"c")
| eval mymatches=mvfilter(match(mysubstrings, "ab"))
| eval mymatches=mvfilter(mymatches!=NULL)
| eval numOccurences=mvcount(mymatches)
| fields - mysubstrings mymatches string2
What it does:
1. adds an x at the end of the string, to prevent an "ab" at the end of the string from causing an erroneous match.
2. splits the strings into individual pieces, using the 'c' as the delimiter
3. eliminates any pieces that do not match the "ab" pattern, and also eliminates any null strings
4. counts the number of substrings remaining
Oh, and you don't need MV_ADD for this to work.
How about sourcetype=wordcount | dedup string | rex field=string max_match=10000 "(?<abc>abc)" | eval abc=mvcount(abc) | table abc
- this does the count of abc
in the string (since abc
does not contain itself, it is an easy calculation)
if you're looking to calculate every count of every word, that gets more interesting, but we can play with regex captures and multivalued fields to get there. Things to remember are:
| rex max_matches=10000 "(?<char>..)"
will result in "SP LU NK" instead of "SP PL LU UN NK" that you might want.So, I ended up with a search that captures a character, then the x
characters after it into two separate fields, then stitches them together afterwards:
| stats count | fields - count | eval string="SPLUNK" | rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{2}))" | eval chars=mvzip(char1, char2) | fields - char1 char2 string | mvexpand chars | eval chars=replace(chars, ",", "")
The {2}
are the characters after the first one, so that means that you will get words of length 3. The above produces
SPL
PLU
LUN
UNK
Repeating it a few times for different length words (make sure to do 1-character last, since the mvzip seems a bit particular) gives us the awesome:
| stats count | fields - count | eval string="bbacbacbaabbacbaabbccaacbacbaabbacbacbaabcccbaabccaacbabca"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{4}))"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{3}))"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{2}))"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{1}))"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>))"
| eval chars=mvzip(char1, char2) | fields - char1 char2 string
| mvexpand chars | eval chars=replace(chars, ",", "") | stats count by chars | sort - count
chars count
1 a 22
2 b 19
3 c 16
4 ba 12
5 cb 9
6 cba 9
7 ac 8
8 acb 8
9 acba 8
10 aa 7
...
Just updated it to polish and leave just the basic and extended answer. Enjoy!
Oh! This is great post, Thank you!
I have a weird idea. It won't work for all strings, but it should work for strings that end in a letter that appears only once in the string. Since 'c' only appears once, at the end, this might work
yoursearchhere
| eval string2=string + "x"
| eval mysubstrings=split(string2,"c")
| eval mymatches=mvfilter(match(mysubstrings, "ab"))
| eval mymatches=mvfilter(mymatches!=NULL)
| eval numOccurences=mvcount(mymatches)
| fields - mysubstrings mymatches string2
What it does:
1. adds an x at the end of the string, to prevent an "ab" at the end of the string from causing an erroneous match.
2. splits the strings into individual pieces, using the 'c' as the delimiter
3. eliminates any pieces that do not match the "ab" pattern, and also eliminates any null strings
4. counts the number of substrings remaining
Oh, and you don't need MV_ADD for this to work.
To share with community, here is something close but actually not working search, but this can do similar job..
sourcetype=wordcount | dedup string | rex field=string max_match=10000 "(?
Iguinn,
Thanks for your post!
This would work particually for a search 'abc'.
What I am trying to do is also
to count how many 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc' etc.
The reason I need this is my language does not have space between words, and sometimes it is required to get how many times specific characters are found in a description field(string field this time) in an event.
I really appreciate your comments!