Splunk Search

How to extract multivalue fields from XML data at search-time?

andra_pietraru
Path Finder

Hi all,

I indexed a XML file and I am trying to extract some fields at search-time.

What I'm trying to do is extract two fields, but I am failing because it is repeating. Right now, Splunk takes only my first match. Any help?? Thanks!

1 Solution

Gilberto_Castil
Splunk Employee
Splunk Employee

You do not need to use a regular expression to obtain the data. Splunk is already instrumented with an automated extraction feature that will look into the XML and assign the value between the tags. A multi-value is normally handled. Let's ellaborate:

Assume that you have a data point as follows:

<switch>
    <managed>true</managed>
       <ipv4address>111.111.111.11</ipv4address>
       <powereddevice>
         <macaddress>000000000001</macaddress>
         <port>11</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000002</macaddress>
         <port>9</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000003</macaddress>
         <port>7</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000004</macaddress>
         <port>5</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000005</macaddress>
         <port>3</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000006</macaddress>
         <port>1</port>
       </powereddevice>
</switch>

Here is a generic inputs.conf configuration:

[monitor:///media/answers/04-15-1015/1/]
disabled = false
index = test
sourcetype = answers-1429106201

This is the companion props.conf. You need to call this explicitly.

[answers-1429106201]
KV_MODE = xml

The end result is something like this:

alt text

This KV_MODE property is a search-time change so you do not need to restart your Splunk instance. You will need to perform that change directly on the Search Head of your deployment.

I hope this helps you,

--
gc

View solution in original post

Gilberto_Castil
Splunk Employee
Splunk Employee

You do not need to use a regular expression to obtain the data. Splunk is already instrumented with an automated extraction feature that will look into the XML and assign the value between the tags. A multi-value is normally handled. Let's ellaborate:

Assume that you have a data point as follows:

<switch>
    <managed>true</managed>
       <ipv4address>111.111.111.11</ipv4address>
       <powereddevice>
         <macaddress>000000000001</macaddress>
         <port>11</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000002</macaddress>
         <port>9</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000003</macaddress>
         <port>7</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000004</macaddress>
         <port>5</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000005</macaddress>
         <port>3</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000006</macaddress>
         <port>1</port>
       </powereddevice>
</switch>

Here is a generic inputs.conf configuration:

[monitor:///media/answers/04-15-1015/1/]
disabled = false
index = test
sourcetype = answers-1429106201

This is the companion props.conf. You need to call this explicitly.

[answers-1429106201]
KV_MODE = xml

The end result is something like this:

alt text

This KV_MODE property is a search-time change so you do not need to restart your Splunk instance. You will need to perform that change directly on the Search Head of your deployment.

I hope this helps you,

--
gc

andra_pietraru
Path Finder

Can you give me a hint on how I could retrieve the corresponding port for a specific macaddress?
Right now, when I do the following query, I get all the ports:

sourcetype=xmlConfig switch.powereddevice.macaddress=000000000001 |rename switch.powereddevice.port AS port | table port

So, instead of getting only port "11", I get all "11,9,7,5,3,1" ports.

0 Karma

aweitzman
Motivator

In order to get the corresponding port to a particular MAC address, you need to associate them first using mvzip:

... | eval MACAndPort = mvzip(switch.powereddevice.macaddress,switch.powereddevice.port) | mvexpand MACAndPort | rex field=MACAndPort "(?<MAC>\d+),(?<Port>\d+)" | ...

andra_pietraru
Path Finder

*Update: solved this error: I tried, but I get error "Error in 'eval' command: The arguments to the 'mvzip' function are invalid." Should I add something to the mvzip function?

**Update 2: I also found an answer for this question: But now my question is how could I retrieve only the port, based on a mac address?

Thank you for your help!

0 Karma

aweitzman
Motivator

Glad you figured out mvzip.

Anyways, if you know what your MAC address is, just append | where MAC=themacaddress to the end of your search string.

andra_pietraru
Path Finder

It did help me. Thanks!

0 Karma

fdi01
Motivator

If you check out the doc on the rex command you'll see that max_match= Controls the number of times the regex is matched. If greater than 1, the resulting fields will be multivalued fields. Defaults to 1, use 0 to mean unlimited.
try like this:

...| rex max_match=0    your_regex...
or 
...| rex max_match=0  field=_raw  your_regex...

see this link to understand fine:
http://answers.splunk.com/answers/227331/using-rex-to-extract-multivalue-fields-from-events.html

aweitzman
Motivator

Can you please show us the command you're using? For instance, when I do this:

| gentimes start=-1 
| eval xml="[your XML above]" 
| rename xml as _raw 
| spath 
| table switch.powereddevice.macaddress switch.powereddevice.port

I get two multivalued fields containing all the values.

joxley
Path Finder

+1 for | rename xml as _raw

0 Karma

andra_pietraru
Path Finder

I didn't use a command. I uploaded the XML file and indexed it and now i'm trying to extract fields, because I will later need to use them for another search command.
Just to explain better: I have other events (from a different source) and they have the IP addresses as fields. What I want is: based on an IP address, search this XML file and get back the port and the macaddress

0 Karma

andra_pietraru
Path Finder

Until now I was trying to write a command using xpath, but was not working. I see that spath is much more helpful. Thanks!

0 Karma

aweitzman
Motivator

Glad to hear it. Using that, you should be able to structure your search something like this:

source=myxmlfiles 
| spath 
| search switch.ipv4address="111.111.111.11"
| rename switch.ipv4address AS IPv4 switch.powereddevice.macaddress AS MACAddresses switch.powereddevice.port AS Ports
| table IPv4 MACAddresses Ports
0 Karma

andra_pietraru
Path Finder

How can I find all events that have that MACaddress as a value in their field?
What I want to achieve:
In a subsearch, based on ipv4address, I want to retrieve all MAC addresses. Then, in my main search, I want to find all events that have that MACaddress as a value for the ID field and make a timechart.
This is what I have so far:
sourcetype=perfdata [search sourcetype=xmlConfig switch.ipv4address=111.111.111.11 | rename switch.powereddevice.macaddress AS ID | return ID] | timechart span=5m values(field1)
But I'm doing something wrong, because I get no results back.
Do you have any suggestion?
Thanks!

0 Karma

aweitzman
Motivator

Use fields ID or table ID rather than return ID in your subsearch.

0 Karma

andra_pietraru
Path Finder

I tried. Still not getting any results back.

0 Karma

aweitzman
Motivator

Hm. What happens when you run the following search:

sourcetype=xmlConfig switch.ipv4address=111.111.111.11 | rename switch.powereddevice.macaddress AS ID | table ID | format

By adding | format to the end of it, you can see what the text of the subsearch will be without it running. Is it what you are expecting?

I just realized that ID is a multivalued field, and you might have to expand it. See if this closer to what you are expecting:

sourcetype=xmlConfig switch.ipv4address=111.111.111.11 | rename switch.powereddevice.macaddress AS ID | mvexpand ID | table ID | format

0 Karma

andra_pietraru
Path Finder

I tried adding mvexpand and now it returns all the IDs that I was expecting. Thanks so much!

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...