Splunk Search

How to get index of a particular letter in string

ma_anand1984
Contributor

How can i find index of last occurrence of letter in value of a field

string


splunk_user
microsoft_good_task
god_particle

Now i want below as answer (index of last occurrence of underscore)
7
15
4

How can i achieve this
I don't see any function to help

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Imagine you're looking in the field message:

...  | rex field=message "_(?<rest>[^_]*)$" | eval pos = if(length(rest) > 0, length(message) - length(rest), -1) | chart values(pos) by message

You'll get -1 if there is no occurrence of the char, and the position if there is at least one.

View solution in original post

timpacl
Path Finder

It has been a while since this thread was active but here is another method to do this:

 len(mvindex(split(lower([string]),"[char]"),0))

Basically, you split [string] at [char] then count the length of the first element in the resulting array to get the 0-based position of [char] in [string]. I add lower around [string] assuming that [string] is a field name and it is unknown what combination of upper-case and lower-case letters might be in it. Because I am showing quotes around [char], it is my assumption that [char] is a fixed text string so you have the opportunity when writing the statement to ensure all characters are in lower case.

You should take note that when [char] does not exist in [string], this will return the length of [string]. You can adjust for that with a more complex formula and while this will return -1 when [char] isn't in [string], it is also more difficult to maintain:

 if(len(mvindex(split(lower([string]),"[char]"),0))=len(lower([string])),-1,len(mvindex(split(lower([string]),"[char]"),0)))

This can be taken a step further. You can use mvcount to get the number of occurrences of [char] in [string]:

 mvcount(split(lower([string]),"[char]"))-1

The array created by split will have 1 more element than the number of occurrences of [char] that it finds in [string]. So if splitting the string "happy" on the char "p", I should get 3 elements "ha", "", "y". mvcount will return the number of elements in the array created by split which is one more that the number of [char]s so we subtract 1 from the result to get the number of occurrences of [char].

I will work on this some more and if I find a more elegant solution, I will post it.

drewski3420
New Member

You should take note that when [char] does not exist in [string], this will return the length of [string]. You can adjust for that with a more complex formula and while this will return -1 when [char] isn't in [string], it is also more difficult to maintain:

if(len(mvindex(split(lower([string]),"[char]"),0))=len(lower([string])),-1,len(mvindex(split(lower([string]),"char]"),0)))

You could also wrap in some ifnull/nullif to avoid having to repeat the formula.

ifnull(nullif(len(mvindex(split(lower([string])),"[char]"),0)),len([string]))),-1)

0 Karma

malvidin
Communicator

To modify @martin_mueller's answer to find where the underscores ("_") are, the "rex" command option, "offset_field", will gather the locations of your match. The "offset_field" option has been available since at least Splunk 6.3.0, but I can't go back farther in the documentation to check when it was introduced. If you only want the first match index, or a limited number of indexed locations, the "max_match" parameters can be changed.

...  
| rex field=message offset_field=message_offsets max_match=0 "(?P<char_match>_)" 
| rex field=message_offsets max_match=0 "char_match=(?P<offset_range>[0-9]+-[0-9]+)"
| chart values(offset_range) by message

 The first "rex" command creates a field named "message_offsets" will contain data like the results of these eval statements, if the character(s) are found.

| eval example_message_offsets = "char_match=7"
| eval example_message_offsets = "char_match=12-12&char_match=19-19"

The second "rex" extracts the index from those values into "offset_range". For one character, the values are the same and separated with a "-". 

If there is no match, both the char_match and the message_offsets fields are null(), so the chart will not include field values that don't contain the string of interest. Unless fillnull, coalesce, or another eval statement is used to populate null values.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Imagine you're looking in the field message:

...  | rex field=message "_(?<rest>[^_]*)$" | eval pos = if(length(rest) > 0, length(message) - length(rest), -1) | chart values(pos) by message

You'll get -1 if there is no occurrence of the char, and the position if there is at least one.

martin_mueller
SplunkTrust
SplunkTrust

No, I'm not aware of a Splunk function that does this in one step.

ma_anand1984
Contributor

@martin_mueller
thanks, are you aware of any function that can do this? for instance substr will get string based on index. we should also be getting index based on value using some other function?

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...