Dashboards & Visualizations

Name - value pair - issue in extracting

jenipriya
Explorer

Hi I am trying to extract the values of certain field present in Log for a particular operation:

My Query:

Service="X1" Operation="Y1" AND AuditType="REQUEST_OUTBOUND" | sort _time | xmlunescape | xmlkv | fields + A, B

There are multiple values present in the log for fields A & B as below:

Message in Log:

<A>1000</A>
<A>29</A>
<A>30</A>

<B>B1</B>
<B>D2</B>
<B>C3</B>

whereas I need my output as :

A       B
1000    B1
29      D2
30      C3

But i am getting output result as :

A   B
30  C3

I am getting only the last value of the name-value pair but not all the values.

Could anyone please help in getting the desired output? Please let me know what command should i use and how to modify the search?

2 Solutions

Ron_Naken
Splunk Employee
Splunk Employee

This limitation seems to stem from the xmlkv command in the search app. It appears that the python script overwrites the previous value as it enumerates new values for the same key.

Here's the culprit:

for kvpair in XML_KV_RE.findall(rawOut):
    r[kvpair[0]] = kvpair[1]

It didn't take much time to create a "hack" to fix this, but it's not an ideal solution. This will work:

  1. In $SPLUNK_HOME/etc/apps/search/bin, copy xmlkv.py to myxml.py
  2. In myxml.py, replace the code above with the following, but be very careful to indent the code properly. The indentations don't display properly on this page, so you will have to eyeball them according to the rest of the code in myxml.py. Everything under the "for kvpair" line needs to be indented, and the lines under "if not" and under "else:" both need to be further indented. If you don't indent properly, you'll probably just generate an error in the Splunk UI when you run the command we're building.

        for kvpair in XML_KV_RE.findall(rawOut):
    if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
    else:
        r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]
    
  3. Create/edit $SPLUNK_HOME/etc/apps/search/local/commands.conf and add the following:

    [myxml]
    filename = myxml.py
    retainsevents = true
    overrides_timeorder = false

  4. Restart Splunk

We have just created a new command, myxml, that you can use in place of xmlkv to collect the multivalue XML fields. The multivalue XML fields will now show up as comma-separated values. For instance, A = "1000, 29, 30" and "B = B1, D2, C3". If you need to split the fields into individual values, you can use makemv.

Here is a sample search:

sourcetype=xmlkv | myxml | makemv delim="," A | makemv delim="," B

When I perform this in the lab, I see A(3) and B(3) in the Field Picker, and I can perform operations on any of the individual values.

HTH
ron

View solution in original post

Ron_Naken
Splunk Employee
Splunk Employee

Here is a post of the myxml.py file for the above solution. I wasn't able to get this to format properly in the initial answer:

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0 import sys,splunk.Intersplunk import re import urllib import xml.sax.saxutils as sax

XML_KV_RE = re.compile("<(.?)(?:\s[^>])?>([^<]*)")

try: results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

for r in results:
    if "_raw" in r:
        raw = r["_raw"]
        rawOut = sax.unescape( raw )
        while( rawOut != raw ):
            raw = rawOut
            rawOut = sax.unescape( raw )                
        r["_raw"] = rawOut

        for kvpair in XML_KV_RE.findall(rawOut):
            if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
            else:
                r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]

except: import traceback stack = traceback.format_exc() results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

View solution in original post

jenipriya
Explorer

Hi Ron

It did work, and thanks a lot for your reply.

My sincere apologies for the late response, as i have some n/w connectivity issues and couldn't sign in for a long time.

Once again thanks for your valuable inputs.

  • Jeni
0 Karma

Ron_Naken
Splunk Employee
Splunk Employee

Here is a post of the myxml.py file for the above solution. I wasn't able to get this to format properly in the initial answer:

# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0 import sys,splunk.Intersplunk import re import urllib import xml.sax.saxutils as sax

XML_KV_RE = re.compile("<(.?)(?:\s[^>])?>([^<]*)")

try: results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

for r in results:
    if "_raw" in r:
        raw = r["_raw"]
        rawOut = sax.unescape( raw )
        while( rawOut != raw ):
            raw = rawOut
            rawOut = sax.unescape( raw )                
        r["_raw"] = rawOut

        for kvpair in XML_KV_RE.findall(rawOut):
            if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
            else:
                r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]

except: import traceback stack = traceback.format_exc() results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

Ron_Naken
Splunk Employee
Splunk Employee

This limitation seems to stem from the xmlkv command in the search app. It appears that the python script overwrites the previous value as it enumerates new values for the same key.

Here's the culprit:

for kvpair in XML_KV_RE.findall(rawOut):
    r[kvpair[0]] = kvpair[1]

It didn't take much time to create a "hack" to fix this, but it's not an ideal solution. This will work:

  1. In $SPLUNK_HOME/etc/apps/search/bin, copy xmlkv.py to myxml.py
  2. In myxml.py, replace the code above with the following, but be very careful to indent the code properly. The indentations don't display properly on this page, so you will have to eyeball them according to the rest of the code in myxml.py. Everything under the "for kvpair" line needs to be indented, and the lines under "if not" and under "else:" both need to be further indented. If you don't indent properly, you'll probably just generate an error in the Splunk UI when you run the command we're building.

        for kvpair in XML_KV_RE.findall(rawOut):
    if not kvpair[0] in r:
                r[kvpair[0]] = kvpair[1]
    else:
        r[kvpair[0]] = r[kvpair[0]] + ", " + kvpair[1]
    
  3. Create/edit $SPLUNK_HOME/etc/apps/search/local/commands.conf and add the following:

    [myxml]
    filename = myxml.py
    retainsevents = true
    overrides_timeorder = false

  4. Restart Splunk

We have just created a new command, myxml, that you can use in place of xmlkv to collect the multivalue XML fields. The multivalue XML fields will now show up as comma-separated values. For instance, A = "1000, 29, 30" and "B = B1, D2, C3". If you need to split the fields into individual values, you can use makemv.

Here is a sample search:

sourcetype=xmlkv | myxml | makemv delim="," A | makemv delim="," B

When I perform this in the lab, I see A(3) and B(3) in the Field Picker, and I can perform operations on any of the individual values.

HTH
ron

jenipriya
Explorer

Tried using "search .... | multikv fields A B" but still not getting any output. Not sure as why. Can someone please help me get this fixed.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...