Splunk Search

Dynamic Tag values etraction using mvzip

ipsitam
New Member

Hi,

I am struggling with xml data in splunk and need help in mvzip /mvexpand command to store multi value pairs with different cardinality and expand them as value pair correctly.

I have a xml which looks as below .

  <xml>
  <Order>1-23445565 </Order>
  <SubOrder></SubOrder>
  <Product>Phone</Product>
  <AppointmentID>1234</AppointmentID>
  <SubOrder></SubOrder>
  <Product>Mobile</Product>
  <AppointmentID>12345</AppointmentID>
  <SubOrder></SubOrder>
  <Product>TV</Product>
  <SubOrder></SubOrder>
  <Product>Internet</Product>
  <AppointmentID>123456</AppointmentID>
  <xml>

I am trying to extract all the multivalue pair values ( Product, AppointmentID) and storing them using mvzip command, but when there is a null value it does not handle it well. I also tried with fillnull command which does not seem to work pretty well here .

My search looks like this at this moment to extract the key value pairs.

 index="*" 
  |rex "<Order>(?P<OrderId>[^\<]+)"
  |rex max_match=0 "<Product>(?P<tmpProduct>[^\<]+)"
  |rex max_match=0 "<AppointmentID>(?P<tmpAppointmentID>[^\<]+)"
  |eval field1=mvzip(tmpProduct,tmpAppointmentID)
  | mvexpand field1
  | rex field=field1 "(?<ProductName>[^,]+),(?<AppointmentID>[^,]+)"
  | table OrderId,field1, ProductName,AppointmentID

In output I am getting below results which is incorrect.

  OrderId    field1    ProductName    AppointmentID
  1-23445565    Phone,1234    Phone    1234
  1-23445565    Mobile,12345    Mobile    12345
  1-23445565    TV,123456    TV    123456

I am missing one record with Product called "Internet" and the AppointmentID is wrongly getting appended to Product "TV" as the same product does not have any AppointmentID in the raw xml data.

Ideally My ouput should look like as below.

  OrderId    field1    ProductName    AppointmentID
  1-23445565    Phone,1234    Phone    1234
  1-23445565    Mobile,12345    Mobile    12345
  1-23445565    TV,"N/A"        TV   "N/A"
  1-23445565    Internet,        123456

Any help here is much appreciated.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Your XML is terrible. Ideally, each product-appointment object would be contained in its own element.

That being said, the rex approach cannot work. After extracting those two lists of values, there's no way to tell which value from one list is missing a value from the second - so there's no way to fill in that value later.

What you would need to do is split the _raw text before extracting (untested, but the general approach should work):

  index="*" |rex "<Order>(?P<OrderId>[^\<]+)"
| rex max_match=0 "(?s)(?<object><SubOrder.*?)(?=\s*<(SubOrder|/?xml))"
| mvexpand object
| rex field=object ...

Then extract product and appointment from the split-up object. This way the link between the fields is retained, and you can now tell what missing field belongs to what object.

Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...