Hi,
I am struggling with xml data in splunk and need help in mvzip /mvexpand command to store multi value pairs with different cardinality and expand them as value pair correctly.
I have a xml which looks as below .
<xml>
<Order>1-23445565 </Order>
<SubOrder></SubOrder>
<Product>Phone</Product>
<AppointmentID>1234</AppointmentID>
<SubOrder></SubOrder>
<Product>Mobile</Product>
<AppointmentID>12345</AppointmentID>
<SubOrder></SubOrder>
<Product>TV</Product>
<SubOrder></SubOrder>
<Product>Internet</Product>
<AppointmentID>123456</AppointmentID>
<xml>
I am trying to extract all the multivalue pair values ( Product, AppointmentID) and storing them using mvzip command, but when there is a null value it does not handle it well. I also tried with fillnull command which does not seem to work pretty well here .
My search looks like this at this moment to extract the key value pairs.
index="*"
|rex "<Order>(?P<OrderId>[^\<]+)"
|rex max_match=0 "<Product>(?P<tmpProduct>[^\<]+)"
|rex max_match=0 "<AppointmentID>(?P<tmpAppointmentID>[^\<]+)"
|eval field1=mvzip(tmpProduct,tmpAppointmentID)
| mvexpand field1
| rex field=field1 "(?<ProductName>[^,]+),(?<AppointmentID>[^,]+)"
| table OrderId,field1, ProductName,AppointmentID
In output I am getting below results which is incorrect.
OrderId field1 ProductName AppointmentID
1-23445565 Phone,1234 Phone 1234
1-23445565 Mobile,12345 Mobile 12345
1-23445565 TV,123456 TV 123456
I am missing one record with Product called "Internet" and the AppointmentID is wrongly getting appended to Product "TV" as the same product does not have any AppointmentID in the raw xml data.
Ideally My ouput should look like as below.
OrderId field1 ProductName AppointmentID
1-23445565 Phone,1234 Phone 1234
1-23445565 Mobile,12345 Mobile 12345
1-23445565 TV,"N/A" TV "N/A"
1-23445565 Internet, 123456
Any help here is much appreciated.
Your XML is terrible. Ideally, each product-appointment object would be contained in its own element.
That being said, the rex
approach cannot work. After extracting those two lists of values, there's no way to tell which value from one list is missing a value from the second - so there's no way to fill in that value later.
What you would need to do is split the _raw
text before extracting (untested, but the general approach should work):
index="*" |rex "<Order>(?P<OrderId>[^\<]+)"
| rex max_match=0 "(?s)(?<object><SubOrder.*?)(?=\s*<(SubOrder|/?xml))"
| mvexpand object
| rex field=object ...
Then extract product and appointment from the split-up object. This way the link between the fields is retained, and you can now tell what missing field belongs to what object.