In an attempt to speed up long running searches I Created a data model (my first) from a single index where the sources are sales_item (invoice line level detail) sales_hdr (summary detail, type of sale) and sales_tracking (carrier and tracking).
Skipping a lot of detail and back story.. I got this search working with goal being to bring in POType from the sales_hdr, tracking info from sales_tracking and plug it in to each item line from sales_item based on Order which is common to all three nodes in the DM.
|tstats summariesonly=t
dc(sales_item.Material) as MaterialCount
dc(sales_item.OrderLine) as OrderLineCount
latest(sales_item.OrderLineStatus) as OrderLineStatus
latest(sales_item.DateCreated) as DateCreated
latest(sales_item.OrderLine) as OrderLine
from datamodel=SCM.sales_item where sales_item.DateCreated=* AND sales_item.Order=0137737819 by sales_item.Order sales_item.Material
| rename sales_item.Order AS Order
| eventstats sum(OrderLineCount) as OrderLineSum sum(MaterialCount) as MaterialSum by Order
| appendcols
[
| tstats summariesonly=t latest(sales_hdr.POType) AS POType from datamodel=SCM.sales_hdr where sales_hdr.CreationDateHdr=* AND sales_hdr.Order=0137737819 by sales_hdr.Order
| rename sales_hdr.Order AS Order]
| eventstats last(POType) as POType by Order
|appendcols
[| tstats summariesonly=t latest(sales_tracking.CarrierName) AS CarrierName latest(sales_tracking.TrackingNumber) AS TrackingNumber from datamodel=SCM.sales_tracking where sales_tracking.Order=0137737819 by sales_tracking.Order sales_tracking.OrderLine
| rename sales_tracking.Order AS Order
| eventstats latest(CarrierName) as CarrierName latest(TrackingNumber) as TrackingNumber by sales_tracking.OrderLine]
| streamstats count as Row by Order
This works great for a single order. But, as soon as I change the nodename.Order=0137737* in all three tstats lines this search returns empty columns, see output in the picture below. I know some of the "sum" columns in here are a little redundant but I am using them to validate the results of the search.
So my questions are.
Output of a * order number search. Where in this case I know that 0137737819 POType is Email. And If I search only that Order in all three tstats lines I get the expected result.
Thanks for reading.. I'll be here scratching my head.
One last note. I have tried using an |eval Order=coalesce to combine Order but this fails worse than the rename I am doing here.
appendcols
is almost never the right answer. It will fail in that query whenever you have more than one "material" per order, because it literally does not "align" based on any fields, it just appends the columns of the output to whichever record happens to come up in the order.
If you change the appendcols
to join
, then you might get a slightly more robust query.
Most likely, though you should revisit your assumptions and use append
, then stats
or eventstats
to massage the results together.
I'm not positive how your data is supposed to work - grouping by OrderLine and/or Material seems like there might be a mismatch. Here's what I think is a decent first cut, assuming that the data from the tracking record maps to the orderlines as if they were keys. If not, then adjust the final eventstats
keys to line the data up correctly.
| tstats summariesonly=t
dc(sales_item.Material) as MaterialCount
dc(sales_item.OrderLine) as OrderLineCount
latest(sales_item.OrderLineStatus) as OrderLineStatus
latest(sales_item.DateCreated) as DateCreated
latest(sales_item.OrderLine) as OrderLine
from datamodel=SCM.sales_item
where sales_item.DateCreated=* AND sales_item.Order=0137737819
by sales_item.Order sales_item.Material
| rename sales_item.Order AS Order
| eval rectype="detail"
| eventstats sum(OrderLineCount) as OrderLineSum sum(MaterialCount) as MaterialSum by Order
| rename COMMENT as "Add header data then throw away the header records"
| append [
| tstats summariesonly=t
latest(sales_hdr.POType) AS POType
from datamodel=SCM.sales_hdr
where sales_hdr.CreationDateHdr=* AND sales_hdr.Order=0137737819
by sales_hdr.Order
| rename sales_hdr.Order AS Order
| eval rectype="header"]
| eventstats last(POType) as POType by Order
| where rectype!="header"
| rename COMMENT as "Add tracking data then throw away the tracking records"
| append [
| tstats summariesonly=t
latest(sales_tracking.CarrierName) AS CarrierName
latest(sales_tracking.TrackingNumber) AS TrackingNumber
from datamodel=SCM.sales_tracking
where sales_tracking.Order=0137737819
by sales_tracking.Order sales_tracking.OrderLine
| rename sales_tracking.Order AS Order, sales_tracking.OrderLine AS OrderLine
| eval rectype="tracking"
]
| eventstats last(CarrierName ) as CarrierName last(TrackingNumber) as TrackingNumber by Order OrderLine
| where rectype!="tracking"
| streamstats count as Row by Order
It REALLY seems like this query ought to be based on individual Orderline lines, not grouped by Material first, to get a valid readout of what is shipping, unless there is some way that you really can be sure that all lines of a particular material in an order always ship at the same time. Please feel free to explain your system in more detail if you need more help, tehn we can get you on the road fastest.
appendcols
is almost never the right answer. It will fail in that query whenever you have more than one "material" per order, because it literally does not "align" based on any fields, it just appends the columns of the output to whichever record happens to come up in the order.
If you change the appendcols
to join
, then you might get a slightly more robust query.
Most likely, though you should revisit your assumptions and use append
, then stats
or eventstats
to massage the results together.
I'm not positive how your data is supposed to work - grouping by OrderLine and/or Material seems like there might be a mismatch. Here's what I think is a decent first cut, assuming that the data from the tracking record maps to the orderlines as if they were keys. If not, then adjust the final eventstats
keys to line the data up correctly.
| tstats summariesonly=t
dc(sales_item.Material) as MaterialCount
dc(sales_item.OrderLine) as OrderLineCount
latest(sales_item.OrderLineStatus) as OrderLineStatus
latest(sales_item.DateCreated) as DateCreated
latest(sales_item.OrderLine) as OrderLine
from datamodel=SCM.sales_item
where sales_item.DateCreated=* AND sales_item.Order=0137737819
by sales_item.Order sales_item.Material
| rename sales_item.Order AS Order
| eval rectype="detail"
| eventstats sum(OrderLineCount) as OrderLineSum sum(MaterialCount) as MaterialSum by Order
| rename COMMENT as "Add header data then throw away the header records"
| append [
| tstats summariesonly=t
latest(sales_hdr.POType) AS POType
from datamodel=SCM.sales_hdr
where sales_hdr.CreationDateHdr=* AND sales_hdr.Order=0137737819
by sales_hdr.Order
| rename sales_hdr.Order AS Order
| eval rectype="header"]
| eventstats last(POType) as POType by Order
| where rectype!="header"
| rename COMMENT as "Add tracking data then throw away the tracking records"
| append [
| tstats summariesonly=t
latest(sales_tracking.CarrierName) AS CarrierName
latest(sales_tracking.TrackingNumber) AS TrackingNumber
from datamodel=SCM.sales_tracking
where sales_tracking.Order=0137737819
by sales_tracking.Order sales_tracking.OrderLine
| rename sales_tracking.Order AS Order, sales_tracking.OrderLine AS OrderLine
| eval rectype="tracking"
]
| eventstats last(CarrierName ) as CarrierName last(TrackingNumber) as TrackingNumber by Order OrderLine
| where rectype!="tracking"
| streamstats count as Row by Order
It REALLY seems like this query ought to be based on individual Orderline lines, not grouped by Material first, to get a valid readout of what is shipping, unless there is some way that you really can be sure that all lines of a particular material in an order always ship at the same time. Please feel free to explain your system in more detail if you need more help, tehn we can get you on the road fastest.
Your search worked great. However I ended up going with a join left
which returned the same results, only slightly faster.
Yeah you we're right about grouping by material and I knew that going in. What I also saw is that if an order had two lines with the same material that only counted as 1 "total" for the order. This is why I was getting a dc of line and materiel to see which one when grouped by the other matched up with individual order searches and lines.
The real rub of this data is that it is event based. So any time someone ticks a box in the order screen, or the carrier updates the shipping information a whole new data set for all that row is generated. Meaning, we have to search from the date created forward for all rows with that Order number, and only count/use the latest one for it. A single order of 10 lines can have 10k raw events in Splunk. Whats worse, is that tracking and header sources do not have a "datecreated" field that matches up with sales_item. This means that I can't limit the number rows searched when joining those others. While the first search runs quickly because it is usually only searching for Order=* DateCreated=20200520, the joins have to search the whole timespan for Order=*. On larger timespans this results in empty columns because Splunk reached the 50k row limit. Thankfully nobody has asked for that level of detail, these are mostly being used for Count-by pie charts and "top".
appendcols
should be fine as long as all appended searches return the same number of results in the same order as the main search. To be clear, the first result returned by the appended search should be directly related to the first result of the main search, and so on for each result.
| tstats summariesonly=t
dc(sales_item.Material) as MaterialCount
dc(sales_item.OrderLine) as OrderLineCount
latest(sales_item.OrderLineStatus) as OrderLineStatus
latest(sales_item.DateCreated) as DateCreated
latest(sales_item.OrderLine) as OrderLine
latest(sales_hdr.POType) AS POType
latest(sales_tracking.CarrierName) AS CarrierName
latest(sales_tracking.TrackingNumber) AS TrackingNumber
from datamodel=SCM.sales_item where sales_item.Order=0137737* OR sales_hdr.Order=0137737* OR sales_tracking.Order=0137737* by sales_item.Order sales_item.Material sales_hdr.Order sales_tracking.Order sales_tracking.OrderLine
Why don't you try searching the whole thing once and then try to fix it?
@to4kawa I would love to do all of this in one line. The issue there is the "from datamodel=
" portion. If I leave this as the root of the datamodel "from datamodel=SCM"
I only get results from sales_hdr, I guess because sales_hdr is the first branch in the tree. So to get results from sales_item I have to specify "from datamodel=SCM.sales_item
" which excludes the results from hdr and tracking.