Re: Help on calculating statistics

Sukisen1981 · ‎02-11-2018

Hi, I have a CSV with something like the one shown. first field is order id and second field is product code.

ordr    prdcd
a              1
a              2
b              3
c             4
c             1
c             5
d             6
d             7
e             2
f             1
g             1
g             5

Ask- Look at order id 'a', it has 3 related products 1, 2 & 3. Ask is to find out if an order has a product , then what else could be the products that are usually purchased along with the same product?
For example, order ids c & g also have product code 1 , in addition both c and g order codes have linked product code 5, so for order id a, and product code 1 (first row) , we should be able to suggest to include product code 5 to be bundled / offered. However, when we come to the second row, there should be 0/no recommendations since the product code 2 is only being used in order id e and e in itself does not have any other linked product codes.
Another example are rows with order id f and order id c(row order id c product code 1). Here the recommendation should be 50% of order ids having product code 1 also have product code 5 AND 25% of order ids having product code 1 also have product code 2 & 3. Since order id a has linked products 2 & 3
I need this kind of recommendations in a third column for each row

sashraf · ‎02-11-2018

Try appending this to the search that generates the data you have above:

| stats values(prdcd) as itemsordered by ordr | stats values(itemsordered) as recommendations by itemsordered | rename itemsordered as prdcd | nomv recommendations

Without seeing your original source it would be difficult to say for sure, however you would most likely be best to write this to a lookup file (using outputlookup) and keep the file updated via a scheduled search, that way you will be able to add an auto lookup to your original search.

Sukisen1981 · ‎02-13-2018

Hi @sashraf thanks for the reply.
I had reached until this point earlier, I guess the stats is the easier part. However, the difficulty is I was not able to add a split by clause to get a percentage of the top linked products.

For example , in my above example look at product 1, it is present in order types a,c,f,g. Now, look at the OTHER products (except product 1) in these 4 orders. 'a' has product type 2 in addition to product type 1, 'c' has product types 4& 5 linked in addition to type 1, 'f' has no other products except type 1, lastly 'g' has type 5 linked other than one.
Hence, the expected recommendation in this case should be something like - 50% of order ids having product code 1 also have product code 5 AND 25% of order ids having product code 1 also have product code 2 & 3. Since order id a has linked products 2 & 4. I am not too worried about the text, but I do need an associated occurrence based count/percentage for the linked products, something like the preceding statement or at least something like
prd ordr linkedprd linkedprdcount 1 a 5 2 c 2 1 f 3 1 g

Help on calculating statistics

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future

Join the Conversation