Splunk Search

Help on calculating statistics

Sukisen1981
Champion

Hi, I have a CSV with something like the one shown. first field is order id and second field is product code.

ordr    prdcd
a              1
a              2
b              3
c             4
c             1
c             5
d             6
d             7
e             2
f             1
g             1
g             5

Ask- Look at order id 'a', it has 3 related products 1, 2 & 3. Ask is to find out if an order has a product , then what else could be the products that are usually purchased along with the same product?
For example, order ids c & g also have product code 1 , in addition both c and g order codes have linked product code 5, so for order id a, and product code 1 (first row) , we should be able to suggest to include product code 5 to be bundled / offered. However, when we come to the second row, there should be 0/no recommendations since the product code 2 is only being used in order id e and e in itself does not have any other linked product codes.
Another example are rows with order id f and order id c(row order id c product code 1). Here the recommendation should be 50% of order ids having product code 1 also have product code 5 AND 25% of order ids having product code 1 also have product code 2 & 3. Since order id a has linked products 2 & 3
I need this kind of recommendations in a third column for each row

Tags (1)
0 Karma

sashraf
New Member

Try appending this to the search that generates the data you have above:

| stats values(prdcd) as itemsordered by ordr | stats values(itemsordered) as recommendations by itemsordered | rename itemsordered as prdcd | nomv recommendations

Without seeing your original source it would be difficult to say for sure, however you would most likely be best to write this to a lookup file (using outputlookup) and keep the file updated via a scheduled search, that way you will be able to add an auto lookup to your original search.

0 Karma

Sukisen1981
Champion

Hi @sashraf thanks for the reply.
I had reached until this point earlier, I guess the stats is the easier part. However, the difficulty is I was not able to add a split by clause to get a percentage of the top linked products.

For example , in my above example look at product 1, it is present in order types a,c,f,g. Now, look at the OTHER products (except product 1) in these 4 orders. 'a' has product type 2 in addition to product type 1, 'c' has product types 4& 5 linked in addition to type 1, 'f' has no other products except type 1, lastly 'g' has type 5 linked other than one.
Hence, the expected recommendation in this case should be something like - 50% of order ids having product code 1 also have product code 5 AND 25% of order ids having product code 1 also have product code 2 & 3. Since order id a has linked products 2 & 4. I am not too worried about the text, but I do need an associated occurrence based count/percentage for the linked products, something like the preceding statement or at least something like
prd ordr linkedprd linkedprdcount
1 a 5 2
c 2 1
f 3 1
g

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...