Splunk Search

Search on XML Multiple Key Attribute Pairs

Path Finder

I am stranded extracting "values" from below xml

   <SearchElements>
    <entry key="FirstName">%</entry>
    <entry key="Gender">MALE</entry>
    <entry key="State">VA</entry>   
</SearchElements> 

I am expecting regex to give me output of values as:

%, MALE, VA 

Here is the regex which doesnt work as expected. Please let me know whats going wrong

rex field=abc "(?ms)\<entry key="\w+"\>(?P<abc>[^<]+)<\entry>"
Tags (2)
0 Karma
1 Solution

Motivator

You don't need to match the whole tag - just match everything up to the start of the next one. Also, be careful of your slashes - in your example you have <\entry> instead of </entry>, and remember to escape the quotes.

| rex max_match=50 field=abc "(?ms)\<entry key=\"\w+\"\>(?<value>[^\<]+)"
| eval valuelist=mvjoin(value, ", ")

Here's how the regex will be processed:

  • Look for the exact leading text <entry key="
  • Movepast one or more "word" characters, indicated by \w
  • Move past a quotation mark and a close-bracket
  • Fill the named capture group "value" with one or more characters that are not an open-bracket symbol
  • That's it - you're done! The only reason to keep matching would be if you either had multiple similar formats, or if you needed to capture more fields.

    Here's what the Splunk commands are doing:

  • rex will repeat the regex processing up to 50 times until all matches are found. Put each match of the named capture group value into a field named value.
  • eval will then join all of these matches into a single line of text, putting a comma and a space between each match.
  • Learning Regular Expressions

    Get a good regex tester like Kodos or RegexBuddy, and take a good look at regular-expressions.info if you need to practice. That's usually easier than trying to debug regexes in the Splunk command line. Also, try to work out exactly what's going on in each of the other examples people have posted -- getting a handle the examples is the key to being able to being able to adapt them to your own needs more quickly.

    View solution in original post

    Motivator

    You don't need to match the whole tag - just match everything up to the start of the next one. Also, be careful of your slashes - in your example you have <\entry> instead of </entry>, and remember to escape the quotes.

    | rex max_match=50 field=abc "(?ms)\<entry key=\"\w+\"\>(?<value>[^\<]+)"
    | eval valuelist=mvjoin(value, ", ")
    

    Here's how the regex will be processed:

  • Look for the exact leading text <entry key="
  • Movepast one or more "word" characters, indicated by \w
  • Move past a quotation mark and a close-bracket
  • Fill the named capture group "value" with one or more characters that are not an open-bracket symbol
  • That's it - you're done! The only reason to keep matching would be if you either had multiple similar formats, or if you needed to capture more fields.

    Here's what the Splunk commands are doing:

  • rex will repeat the regex processing up to 50 times until all matches are found. Put each match of the named capture group value into a field named value.
  • eval will then join all of these matches into a single line of text, putting a comma and a space between each match.
  • Learning Regular Expressions

    Get a good regex tester like Kodos or RegexBuddy, and take a good look at regular-expressions.info if you need to practice. That's usually easier than trying to debug regexes in the Splunk command line. Also, try to work out exactly what's going on in each of the other examples people have posted -- getting a handle the examples is the key to being able to being able to adapt them to your own needs more quickly.

    View solution in original post

    Path Finder
    0 Karma

    Path Finder

    Thanks for wonderful explaination. Thats really informative

    0 Karma