How can i find index of last occurrence of letter in value of a field
string
splunk_user
microsoft_good_task
god_particle
Now i want below as answer (index of last occurrence of underscore)
7
15
4
How can i achieve this
I don't see any function to help
Imagine you're looking in the field message:
... | rex field=message "_(?<rest>[^_]*)$" | eval pos = if(length(rest) > 0, length(message) - length(rest), -1) | chart values(pos) by message
You'll get -1 if there is no occurrence of the char, and the position if there is at least one.
It has been a while since this thread was active but here is another method to do this:
len(mvindex(split(lower([string]),"[char]"),0))
Basically, you split [string] at [char] then count the length of the first element in the resulting array to get the 0-based position of [char] in [string]. I add lower around [string] assuming that [string] is a field name and it is unknown what combination of upper-case and lower-case letters might be in it. Because I am showing quotes around [char], it is my assumption that [char] is a fixed text string so you have the opportunity when writing the statement to ensure all characters are in lower case.
You should take note that when [char] does not exist in [string], this will return the length of [string]. You can adjust for that with a more complex formula and while this will return -1 when [char] isn't in [string], it is also more difficult to maintain:
if(len(mvindex(split(lower([string]),"[char]"),0))=len(lower([string])),-1,len(mvindex(split(lower([string]),"[char]"),0)))
This can be taken a step further. You can use mvcount to get the number of occurrences of [char] in [string]:
mvcount(split(lower([string]),"[char]"))-1
The array created by split will have 1 more element than the number of occurrences of [char] that it finds in [string]. So if splitting the string "happy" on the char "p", I should get 3 elements "ha", "", "y". mvcount will return the number of elements in the array created by split which is one more that the number of [char]s so we subtract 1 from the result to get the number of occurrences of [char].
I will work on this some more and if I find a more elegant solution, I will post it.
You should take note that when [char] does not exist in [string], this will return the length of [string]. You can adjust for that with a more complex formula and while this will return -1 when [char] isn't in [string], it is also more difficult to maintain:
if(len(mvindex(split(lower([string]),"[char]"),0))=len(lower([string])),-1,len(mvindex(split(lower([string]),"char]"),0)))
You could also wrap in some ifnull/nullif to avoid having to repeat the formula.
ifnull(nullif(len(mvindex(split(lower([string])),"[char]"),0)),len([string]))),-1)
To modify @martin_mueller's answer to find where the underscores ("_") are, the "rex" command option, "offset_field", will gather the locations of your match. The "offset_field" option has been available since at least Splunk 6.3.0, but I can't go back farther in the documentation to check when it was introduced. If you only want the first match index, or a limited number of indexed locations, the "max_match" parameters can be changed.
...
| rex field=message offset_field=message_offsets max_match=0 "(?P<char_match>_)"
| rex field=message_offsets max_match=0 "char_match=(?P<offset_range>[0-9]+-[0-9]+)"
| chart values(offset_range) by message
The first "rex" command creates a field named "message_offsets" will contain data like the results of these eval statements, if the character(s) are found.
| eval example_message_offsets = "char_match=7"
| eval example_message_offsets = "char_match=12-12&char_match=19-19"
The second "rex" extracts the index from those values into "offset_range". For one character, the values are the same and separated with a "-".
If there is no match, both the char_match and the message_offsets fields are null(), so the chart will not include field values that don't contain the string of interest. Unless fillnull, coalesce, or another eval statement is used to populate null values.
Imagine you're looking in the field message:
... | rex field=message "_(?<rest>[^_]*)$" | eval pos = if(length(rest) > 0, length(message) - length(rest), -1) | chart values(pos) by message
You'll get -1 if there is no occurrence of the char, and the position if there is at least one.
No, I'm not aware of a Splunk function that does this in one step.
@martin_mueller
thanks, are you aware of any function that can do this? for instance substr will get string based on index. we should also be getting index based on value using some other function?