Splunk Search

Can a subsearch do this? Or do I need a custom search command?

Contributor

Hi,

In my Splunk data (say) I've got a running list of customer purchases, with a customer ID number and an Item Number as part of the data. I'd like Splunk to show me sort of a "social networking" view of who is buying what, based upon a single initial purchase. (No, I don't work for Amazon...)

In other words, like to work out all of the combinations and permutations of customers IDs and item numbers and display them all - "Show me all of the purchases made by customers who bought an iPhone, and for those non-iPhone purchases, show me who else bought those items (and so on)"...

So, something like:

1) Start with a single unique Item Number
2) Get a list of associated Customer IDs
3) Get a list of associated Item Numbers that each of the customers in (2) purchased
4) Repeat 2 and 3 until the list stops growing.
5) Display results

I know I can use a sub search to get a complete list of things up through step (3), but I'm looking to go the next step, and for each of the unique Item Numbers returned as part of step 3, to feed it back through Splunk to see which customer IDs purchased those items, and so on.

Eventually, the list will stop "growing" as there are only so many customers and item numbers in my catalogue.

The ideal output would then tell me which item numbers were purchased by which customer IDs as matched in the full data, but all based upon a single initial purchase of a specific item. That's a key part to all this -- as I'm trying to get a picture of who bought what, when, based upon buying behaviour itself.

Hope that makes sense!

Tags (1)
0 Karma

Path Finder

To achieve your step 4, you need to compute a transitive closure of the purchase relationship among products. For example, if X is the set of products and x R y means "product x purchasers also purchased product y", then the transitive closure of R on X is the relation R+: "we hope product x purchasers have some interest in product y." See Wikipedia for more information.

You could create a custom search command to compute the transitive closure.

I would guess that the transitive closure would grow to include most products and customers if:

  • You sell the same product to many different customers.
  • Many customers buy a variety of products.

You can explore several iterations using Splunk lookup tables. First, create a lookup table that records all of the purchases (mapping your customer and product ID numbers):

... | table Customer, Product | dedup Customer, Product | outputlookup purchases.csv

Now you can lookup a product number and find all the customers who purchased it. Starting with a search (shown as ...) for a single product number:

... | lookup purchases.csv Product OUTPUT Customer as Customers

Customers is a multi-valued field. The mvexpand command will create multiple "events," one for each customer:

... | lookup purchases.csv Product OUTPUT Customer as Customers | mvexpand Customers

Now you can use the same lookup table to find the products purchased by those customers and clean up the results:

... | lookup purchases.csv Product OUTPUT Customer as Customers | mvexpand Customers
| lookup purchases.csv Customer as Customers OUTPUT Product as Products | mvexpand Products
| table Products | dedup Products

That completes the first iteration of product --> customers --> products. You can append additional iterations to the same search. Each iteration looks like this:

| lookup purchases.csv Product as Products OUTPUT Customer as Customers | mvexpand Customers
| lookup purchases.csv Customer as Customers OUTPUT Product as Products | mvexpand Products
| table Products | dedup Products

Good luck.

0 Karma