Splunk Search

Dynamic Tag values etraction using mvzip

ipsitam
New Member

Hi,

I am struggling with xml data in splunk and need help in mvzip /mvexpand command to store multi value pairs with different cardinality and expand them as value pair correctly.

I have a xml which looks as below .

  <xml>
  <Order>1-23445565 </Order>
  <SubOrder></SubOrder>
  <Product>Phone</Product>
  <AppointmentID>1234</AppointmentID>
  <SubOrder></SubOrder>
  <Product>Mobile</Product>
  <AppointmentID>12345</AppointmentID>
  <SubOrder></SubOrder>
  <Product>TV</Product>
  <SubOrder></SubOrder>
  <Product>Internet</Product>
  <AppointmentID>123456</AppointmentID>
  <xml>

I am trying to extract all the multivalue pair values ( Product, AppointmentID) and storing them using mvzip command, but when there is a null value it does not handle it well. I also tried with fillnull command which does not seem to work pretty well here .

My search looks like this at this moment to extract the key value pairs.

 index="*" 
  |rex "<Order>(?P<OrderId>[^\<]+)"
  |rex max_match=0 "<Product>(?P<tmpProduct>[^\<]+)"
  |rex max_match=0 "<AppointmentID>(?P<tmpAppointmentID>[^\<]+)"
  |eval field1=mvzip(tmpProduct,tmpAppointmentID)
  | mvexpand field1
  | rex field=field1 "(?<ProductName>[^,]+),(?<AppointmentID>[^,]+)"
  | table OrderId,field1, ProductName,AppointmentID

In output I am getting below results which is incorrect.

  OrderId    field1    ProductName    AppointmentID
  1-23445565    Phone,1234    Phone    1234
  1-23445565    Mobile,12345    Mobile    12345
  1-23445565    TV,123456    TV    123456

I am missing one record with Product called "Internet" and the AppointmentID is wrongly getting appended to Product "TV" as the same product does not have any AppointmentID in the raw xml data.

Ideally My ouput should look like as below.

  OrderId    field1    ProductName    AppointmentID
  1-23445565    Phone,1234    Phone    1234
  1-23445565    Mobile,12345    Mobile    12345
  1-23445565    TV,"N/A"        TV   "N/A"
  1-23445565    Internet,        123456

Any help here is much appreciated.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Your XML is terrible. Ideally, each product-appointment object would be contained in its own element.

That being said, the rex approach cannot work. After extracting those two lists of values, there's no way to tell which value from one list is missing a value from the second - so there's no way to fill in that value later.

What you would need to do is split the _raw text before extracting (untested, but the general approach should work):

  index="*" |rex "<Order>(?P<OrderId>[^\<]+)"
| rex max_match=0 "(?s)(?<object><SubOrder.*?)(?=\s*<(SubOrder|/?xml))"
| mvexpand object
| rex field=object ...

Then extract product and appointment from the split-up object. This way the link between the fields is retained, and you can now tell what missing field belongs to what object.

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...