- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi
I am trying to figure out how to count 'abc' string in the following string field.
2012/07/21 16:18:30 string=bbacbacbaabbacbaabbccaacbacbaabbacbacbaabcccbaabccaacbabca
2012/07/21 16:18:30 string=abccacbabcbbcbacbbbbccbaabcccaacbacbaccbacbacbacbaccbacbac
2012/07/21 16:18:29 string=ccaaabbccaacbacbaccaacbaccaabbccbacabccbacbacbbccaabbbccaa
I could configure MV_ADD in transforms.conf to get multiple values for a same field in a event.
However, I still can not figure out how to set up MV_ADD for string field.
I want to do it without configuring in the transforms.conf if it is possible to do this only by search query.
Does anyone know how to do this?
Thank you!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I have a weird idea. It won't work for all strings, but it should work for strings that end in a letter that appears only once in the string. Since 'c' only appears once, at the end, this might work
yoursearchhere
| eval string2=string + "x"
| eval mysubstrings=split(string2,"c")
| eval mymatches=mvfilter(match(mysubstrings, "ab"))
| eval mymatches=mvfilter(mymatches!=NULL)
| eval numOccurences=mvcount(mymatches)
| fields - mysubstrings mymatches string2
What it does:
1. adds an x at the end of the string, to prevent an "ab" at the end of the string from causing an erroneous match.
2. splits the strings into individual pieces, using the 'c' as the delimiter
3. eliminates any pieces that do not match the "ab" pattern, and also eliminates any null strings
4. counts the number of substrings remaining
Oh, and you don't need MV_ADD for this to work.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How about sourcetype=wordcount | dedup string | rex field=string max_match=10000 "(?<abc>abc)" | eval abc=mvcount(abc) | table abc
- this does the count of abc
in the string (since abc
does not contain itself, it is an easy calculation)
if you're looking to calculate every count of every word, that gets more interesting, but we can play with regex captures and multivalued fields to get there. Things to remember are:
- Running a regex multiple times will make it pick up where it left off, so for the word SPLUNK,
| rex max_matches=10000 "(?<char>..)"
will result in "SP LU NK" instead of "SP PL LU UN NK" that you might want. - You can use a trick of capturing data in a positive lookahead which doesn't consume any characters
- If you don't have any characters outside the zero-width assertion, Splunk won't advance in the regex and will pull up to max_matches of... the beginning of the string.
- It appears you can't capture both the character and the positive lookahead following it in the same field
So, I ended up with a search that captures a character, then the x
characters after it into two separate fields, then stitches them together afterwards:
| stats count | fields - count | eval string="SPLUNK" | rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{2}))" | eval chars=mvzip(char1, char2) | fields - char1 char2 string | mvexpand chars | eval chars=replace(chars, ",", "")
The {2}
are the characters after the first one, so that means that you will get words of length 3. The above produces
SPL
PLU
LUN
UNK
Repeating it a few times for different length words (make sure to do 1-character last, since the mvzip seems a bit particular) gives us the awesome:
| stats count | fields - count | eval string="bbacbacbaabbacbaabbccaacbacbaabbacbacbaabcccbaabccaacbabca"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{4}))"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{3}))"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{2}))"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>.{1}))"
| rex field=string max_match=9999 "(?<char1>.)(?=(?<char2>))"
| eval chars=mvzip(char1, char2) | fields - char1 char2 string
| mvexpand chars | eval chars=replace(chars, ",", "") | stats count by chars | sort - count
chars count
1 a 22
2 b 19
3 c 16
4 ba 12
5 cb 9
6 cba 9
7 ac 8
8 acb 8
9 acba 8
10 aa 7
...
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just updated it to polish and leave just the basic and extended answer. Enjoy!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oh! This is great post, Thank you!
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I have a weird idea. It won't work for all strings, but it should work for strings that end in a letter that appears only once in the string. Since 'c' only appears once, at the end, this might work
yoursearchhere
| eval string2=string + "x"
| eval mysubstrings=split(string2,"c")
| eval mymatches=mvfilter(match(mysubstrings, "ab"))
| eval mymatches=mvfilter(mymatches!=NULL)
| eval numOccurences=mvcount(mymatches)
| fields - mysubstrings mymatches string2
What it does:
1. adds an x at the end of the string, to prevent an "ab" at the end of the string from causing an erroneous match.
2. splits the strings into individual pieces, using the 'c' as the delimiter
3. eliminates any pieces that do not match the "ab" pattern, and also eliminates any null strings
4. counts the number of substrings remaining
Oh, and you don't need MV_ADD for this to work.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

To share with community, here is something close but actually not working search, but this can do similar job..
sourcetype=wordcount | dedup string | rex field=string max_match=10000 "(?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Iguinn,
Thanks for your post!
This would work particually for a search 'abc'.
What I am trying to do is also
to count how many 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc' etc.
The reason I need this is my language does not have space between words, and sometimes it is required to get how many times specific characters are found in a description field(string field this time) in an event.
I really appreciate your comments!
