I am currently trying to use the Splunk REST API to extract a heap of data.
I have written a search query:
(index=* OR index=_*) (index="myindex") | table "order-no", "order-date", "price"
But it provides a result set which looks like this:
"Order-no", "order-date", "price","quantity"
1 , 20/05/2017 , 1.0, 1
--------------------, 2.0, 1
---------------------,3.0, 2
It rolls 2.0
and 3.0
up to 1 record along side row order-no
= 1
.
So when i try to export this as a CSV using REST API, the output is all jumbled up.
Is there a way to flatten this in the search so it shows as:
"Order-no", "order-date", "price","quantity"
1 , 20/05/2017 , 1.0, 1
1 , 20/05/2017 , 2.0, 1
1 , 20/05/2017 , 3.0, 2
I've tried to use mvexpand
, but it appears that it works on a individual column level and if i do a mvexpand
on both of those last 2 columns, it effectively does a cross join on the entire data which causes incorrect values
Thanks in advance.
Try this ...
(index=* OR index=_*) (index="myindex" "order{@order-no}"="1" "order.customer.customer-no"="*")
| rename order{@order-no} as "order-no",
order.order-date as "order-date",
order.currency as "currency",
order.shipments.shipment.shipping-address.country-code as "country-code",
order.product-lineitems.product-lineitem.product-id as "productid",
order.product-lineitems.product-lineitem.quantity as "quantity",
order.product-lineitems.product-lineitem.base-price as "baseprice",
order.product-lineitems.product-lineitem.net-price as "netprice",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.promotion-id as "adjustpromotionid",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.campaign-id as "adjustcampaignid",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.coupon-id as "adjustcouponid",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.lineitem-text as "adjustlineitemtext",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.base-price as "adjustbaseprice",
order.product-lineitems.product-lineitem.tax as "tax",
order.customer.customer-no as "customer-no"
| streamstats count as recno
| eval myFan=mvrange(0,mvcount(quantity))
| mvexpand myFan
| eval productid=mvindex(productid,myFan)
| eval quantity=mvindex(quantity,myFan)
| eval baseprice=mvindex(baseprice,myFan)
| eval netprice=mvindex(netprice,myFan)
| rename COMMENT as "To accurately line up this part of the data, you should go back to extract"
| rename COMMENT as "the data directly from the JSON for each line-item by productid."
| rename COMMENT as "We would need a properly formatted JSON sample to code that extract from the JSON or the _raw,"
| rename COMMENT as "So this is just a placeholder method."
| eval adjustpromotionid=mvindex(adjustpromotionid,myFan)
| eval adjustcampaignid=mvindex(adjustcampaignid,myFan)
| eval adjustcouponid=mvindex(adjustcouponid,myFan)
| eval adjustlineitemtext=mvindex(adjustlineitemtext,myFan)
Like this:
|makeresults | eval orders="1.0,1:2.0,1:3.0,2"
| makemv delim=":" orders
| mvexpand orders
| rename orders AS _raw
| eval "Order-no" = 1
| rename _time AS "order-date"
| convert ctime(*date)
| rex "^(?<price>\S+),(?<quantity>\S+)$"
| fields - _raw
| stats list(*) AS * BY Order-no order-date
| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"
| eval PsAndQs = mvzip(price,quantity)
| mvexpand PsAndQs
| rex field=PsAndQs "^(?<price>\S+),(?<quantity>\S+)$"
| fields - PsAndQs
Interesting, even your answer drops it when one of the columns within your mvzip is empty.
I think its because the multivalue field actually doesn't have a value for it, and doesn't know about the other columns. Even if i use a replace "" or fillnull on the original data, it doesn't fill the ones in MVs. Any idea how to fillnulls/replace "" in MV so it lines up with the data?
This makes no sense. How could a valid row or event ever have an undefined price
or an undefined quantity
and even if it did, who care if it is lost, it is totally invalid/nonsensical anyway. Something really doesn't add up here. If you run the full run-anywhere example, there is no question: it works.
Im adding a few more other MV columns which arent equal in rows. E.g. they may have nulls.
Now see, that makes a H*U*G*E difference. Just add | fillnull value="NULL"
.
attempting to fillnull, but doesn't seem to do anything, still only showing the ones which are not null for all of the fields in mvzip.
|fillnull value=NULL newfield
| eval PsAndQs = mvzip(mvzip(price,quantity),newfield)
| mvexpand PsAndQs
| rex field=PsAndQs "^(?\S+),(?\S+),(?\S+)$"
| fields - PsAndQs
Just show us a REAL starting event with ALL of the fields (one that ends up broken).
Input Table:
order-no order-date currency country-code quantity(MV) base-price(MV) net-price(MV) product-id(MV) customer-no adjust-coupon-id(MV)
1 2017-07-31T13:13:31.000Z AUD AU 1.0 30.00 27.27 2 123 BFFYAY
1.0 50.00 45.45 3 BFFYAY
1.0 45.00 40.91 4 BFFYAY
1.0 45.00 40.91 5 <NO VALUE>
Current Query to Flatten:
(index=* OR index=_*) (index="myindex" "order{@order-no}"="1" "order.customer.customer-no"="*")
| rename order{@order-no} as "order-no",
order.order-date as "order-date",
order.currency as "currency",
order.shipments.shipment.shipping-address.country-code as "country-code",
order.product-lineitems.product-lineitem.product-id as "productid",
order.product-lineitems.product-lineitem.quantity as "quantity",
order.product-lineitems.product-lineitem.base-price as "baseprice",
order.product-lineitems.product-lineitem.net-price as "netprice",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.promotion-id as "adjustpromotionid",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.campaign-id as "adjustcampaignid",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.coupon-id as "adjustcouponid",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.lineitem-text as "adjustlineitemtext",
order.product-lineitems.product-lineitem.price-adjustments.price-adjustment.base-price as "adjustbaseprice",
order.product-lineitems.product-lineitem.tax as "tax",
order.customer.customer-no as "customer-no"
| eval combine=mvzip(mvzip(mvzip(mvzip(coalesce(productid,""),coalesce(quantity,""),"!!#@#!!"),coalesce(baseprice,""),"!!#@#!!"),coalesce(netprice,""),"!!#@#!!"),coalesce(adjustcouponid,""),"!!#@#!!")
| mvexpand combine
| eval allvalues=split(combine,"!!#@#!!")
| eval "adjust-coupon-id"=mvindex(allvalues,4)
| eval "net-price"=mvindex(allvalues,3)
| eval "base-price"=mvindex(allvalues,2)
| eval "quantity"=mvindex(allvalues,1)
| eval "product-id"=mvindex(allvalues,0)
| table "order-no", "order-date", "currency", "country-code","quantity", "base-price", "net-price", "product-id", "customer-no", "adjust-coupon-id"
Result:
order-no order-date currency country-code quantity base-price net-price product-id customer-no adjust-coupon-id
1 2017-07-31T13:13:31.000Z AUD AU 1.0 30.00 27.27 2 00436933 BFFYAY
1 2017-07-31T13:13:31.000Z AUD AU 1.0 50.00 45.45 3 00436933 BFFYAY
1 2017-07-31T13:13:31.000Z AUD AU 1.0 45.00 40.91 4 00436933 BFFYAY
Assuming you have this format
| table order-no order-date price quantity
and price and quantity are both multivalue fields that are aligned with each other, then do this...
| eval price=mvzip(price,quantity,"@")
| mvexpand price
| eval price=split(price,"@")
| eval quantity=mvindex(price,1)
| eval price=mvindex(price,0)
If you had a third field, which might contain an @
sign, and you don't have people entering enthusiastic comments, then I recommend five bangs as a delimiter "!!!!!"
. You can also substitute any highly unlikely combination of characters ( "!!#@#!!"
).
| table order-no order-date price quantity description
| eval price=mvzip(mvzip(price,quantity,"!!!!!"),description,"!!!!!")
| mvexpand price
| eval price=split(price,"!!!!!")
| eval description=mvindex(price,2)
| eval quantity=mvindex(price,1)
| eval price=mvindex(price,0)
@shinglau - These methods are not appropriate for fields that are not precisely matched, with the same number of mv fields. If a field is supposed to be matched and is null, then you need to create a placeholder mv field, for example with the word "NULL" the correct number of times.
Assuming that the description may be null, the price and quantity are always matched and never null, and price and quantity can never appear more than 40 times, that could look something like...
| table order-no order-date price quantity description
| eval mynulls=mvappend("((UNUSED))",
"((NULL))","((NULL))","((NULL))","((NULL))","((NULL))", "((NULL))","((NULL))","((NULL))",
"((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))",
"((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))",
"((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))",
"((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))","((NULL))")
| eval description=coalesce(description,mvindex(mynulls,1,mvcount(price)))
| eval price=mvzip(mvzip(price,quantity,"!!!!!"),description,"!!!!!")
| mvexpand price
| eval price=split(price,"!!!!!")
| eval description=mvindex(price,2)
| eval description=if(description="((NULL))",null(),description)
| eval quantity=mvindex(price,1)
| eval price=mvindex(price,0)
If a field is not directly linked to the other fields (price, quantity, cost etc) then do not handle it in the same phase. If it IS directly linked to the other fields but is for some reason not aligned, then it requires a separate question with specifics so that we can handle that error appropriately.
That works for when the entire MV is null, but doesn't work for when there are 4 rows, but in the MV there is 3 of them which has a value, but the 4th one is non-existent.
Looks like when any of the columns within the mvzip is null / empty string, it drops the entire row. Anyway to fix this?
Sure, give my answer a try.
I tried this solution.
But it looks like it only returns the first record not all of them.
It should look like this:
"Order-no", "order-date", "price","quantity"
1 , 20/05/2017 , 1.0, 1
1 , 20/05/2017 , 2.0, 1
1 , 20/05/2017 , 3.0, 2
But instead it looks like this:
"Order-no", "order-date", "price","quantity"
1 , 20/05/2017 , 1.0, 1
Looks like using this solution only returns the first record, and doesn't return the other records within the group.
Should look like:
"Order-no", "order-date", "price","quantity"
1 , 20/05/2017 , 1.0, 1
1 , 20/05/2017 , 2.0, 1
1 , 20/05/2017 , 3.0, 2
Instead it returns this:
"Order-no", "order-date", "price","quantity"
1 , 20/05/2017 , 1.0, 1
I tried this solution but it keeps on giving me error:
Error in 'eval' command: The arguments to the 'mvzip' function are invalid.
Nevermind found out why..... Looks like eval doesn't like columns with "-" in it.
Had to rename the column without a "-" then reference it in mvzip and it worked.