Dashboards & Visualizations

Name - value pair - issue in extracting

jenipriya
Explorer

Hi I am trying to extract the values of certain field present in Log for a particular operation:

My Query:

Service="X1" Operation="Y1" AND AuditType="REQUEST_OUTBOUND" | sort _time | xmlunescape | xmlkv | fields + A, B

There are multiple values present in the log for fields A & B as below:

Message in Log:

<A>1000</A>
<A>29</A>
<A>30</A>

<B>B1</B>
<B>D2</B>
<B>C3</B>

whereas I need my output as :

A       B
1000    B1
29      D2
30      C3

But i am getting output result as :

A   B
30  C3

I am getting only the last value of the name-value pair but not all the values.

Could anyone please help in getting the desired output? Please let me know what command should i use and how to modify the search?

2 Solutions

Ron_Naken
Splunk Employee
Splunk Employee

This limitation seems to stem from the xmlkv command in the search app. It appears that the python script overwrites the previous value as it enumerates new values for the same key.

Here's the culprit:

for kvpair in XML_KV_RE.findall(rawOut):
    r[kvpair[0]] = kvpair[1]

It didn't take much time to create a "hack" to fix this, but it's not an ideal solution. This will work:

  1. In $SPLUNK_HOME/etc/apps/search/bin, copy xmlkv.py to myxml.py
  2. In myxml.py, replace the code above with the following, but be very careful to indent the code properly. The indentations don't display properly on this page, so you will have to eyeball them according to the rest of the code in myxml.py. Everything under the "for kvpair" line needs to be indented, and the lines under "if not" and under "else:" both need to be further indented. If you don't indent properly, you'll probably just generate an error in the Splunk UI when you run the command we're building.

        for kvpair in XML_KV_RE.findall(rawOut):
    if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
    else:
        r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]
    
  3. Create/edit $SPLUNK_HOME/etc/apps/search/local/commands.conf and add the following:

    [myxml]
    filename = myxml.py
    retainsevents = true
    overrides_timeorder = false

  4. Restart Splunk

We have just created a new command, myxml, that you can use in place of xmlkv to collect the multivalue XML fields. The multivalue XML fields will now show up as comma-separated values. For instance, A = "1000, 29, 30" and "B = B1, D2, C3". If you need to split the fields into individual values, you can use makemv.

Here is a sample search:

sourcetype=xmlkv | myxml | makemv delim="," A | makemv delim="," B

When I perform this in the lab, I see A(3) and B(3) in the Field Picker, and I can perform operations on any of the individual values.

HTH
ron

View solution in original post

Ron_Naken
Splunk Employee
Splunk Employee

Here is a post of the myxml.py file for the above solution. I wasn't able to get this to format properly in the initial answer:

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0 import sys,splunk.Intersplunk import re import urllib import xml.sax.saxutils as sax

XML_KV_RE = re.compile("<(.?)(?:\s[^>])?>([^<]*)")

try: results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

for r in results:
    if "_raw" in r:
        raw = r["_raw"]
        rawOut = sax.unescape( raw )
        while( rawOut != raw ):
            raw = rawOut
            rawOut = sax.unescape( raw )                
        r["_raw"] = rawOut

        for kvpair in XML_KV_RE.findall(rawOut):
            if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
            else:
                r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]

except: import traceback stack = traceback.format_exc() results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

View solution in original post

jenipriya
Explorer

Hi Ron

It did work, and thanks a lot for your reply.

My sincere apologies for the late response, as i have some n/w connectivity issues and couldn't sign in for a long time.

Once again thanks for your valuable inputs.

  • Jeni
0 Karma

Ron_Naken
Splunk Employee
Splunk Employee

Here is a post of the myxml.py file for the above solution. I wasn't able to get this to format properly in the initial answer:

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0 import sys,splunk.Intersplunk import re import urllib import xml.sax.saxutils as sax

XML_KV_RE = re.compile("<(.?)(?:\s[^>])?>([^<]*)")

try: results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

for r in results:
    if "_raw" in r:
        raw = r["_raw"]
        rawOut = sax.unescape( raw )
        while( rawOut != raw ):
            raw = rawOut
            rawOut = sax.unescape( raw )                
        r["_raw"] = rawOut

        for kvpair in XML_KV_RE.findall(rawOut):
            if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
            else:
                r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]

except: import traceback stack = traceback.format_exc() results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

Ron_Naken
Splunk Employee
Splunk Employee

This limitation seems to stem from the xmlkv command in the search app. It appears that the python script overwrites the previous value as it enumerates new values for the same key.

Here's the culprit:

for kvpair in XML_KV_RE.findall(rawOut):
    r[kvpair[0]] = kvpair[1]

It didn't take much time to create a "hack" to fix this, but it's not an ideal solution. This will work:

  1. In $SPLUNK_HOME/etc/apps/search/bin, copy xmlkv.py to myxml.py
  2. In myxml.py, replace the code above with the following, but be very careful to indent the code properly. The indentations don't display properly on this page, so you will have to eyeball them according to the rest of the code in myxml.py. Everything under the "for kvpair" line needs to be indented, and the lines under "if not" and under "else:" both need to be further indented. If you don't indent properly, you'll probably just generate an error in the Splunk UI when you run the command we're building.

        for kvpair in XML_KV_RE.findall(rawOut):
    if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
    else:
        r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]
    
  3. Create/edit $SPLUNK_HOME/etc/apps/search/local/commands.conf and add the following:

    [myxml]
    filename = myxml.py
    retainsevents = true
    overrides_timeorder = false

  4. Restart Splunk

We have just created a new command, myxml, that you can use in place of xmlkv to collect the multivalue XML fields. The multivalue XML fields will now show up as comma-separated values. For instance, A = "1000, 29, 30" and "B = B1, D2, C3". If you need to split the fields into individual values, you can use makemv.

Here is a sample search:

sourcetype=xmlkv | myxml | makemv delim="," A | makemv delim="," B

When I perform this in the lab, I see A(3) and B(3) in the Field Picker, and I can perform operations on any of the individual values.

HTH
ron

jenipriya
Explorer

Tried using "search .... | multikv fields A B" but still not getting any output. Not sure as why. Can someone please help me get this fixed.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...