All Apps and Add-ons

Using Machine Learning Toolkit - Predict Categorical Variables Assistant to predict a field from a lookup table based on event log data

New Member

Is it possible to join a lookup table that contains a categorical field to an event log search so that I can use the online behavior of customers (i.e. sites visited, times visited etc. etc.) to predict customer churn. My lookup table is made up of standard customer information and demographics with a column called withdrawn to represent recently withdrawn or lost customers. I can susscful use MLTK assistant to predict the categorical variable of withdrawn based on other columns in the same lookup table but that’s not what I need. When I attempt to join to a regular log search such as index=main ... | lookup customers.csv it returns results but will not allow me to choose my predicting columns like it usually does.

0 Karma
1 Solution

Splunk Employee
Splunk Employee

Can you share your search?
I suspect you need to ...|lookup nameofclookup.csv ...OUTPUTNEW... like any spl search

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Can you share your search?
I suspect you need to ...|lookup nameofclookup.csv ...OUTPUTNEW... like any spl search

View solution in original post

0 Karma

New Member

Ok so the following code correctly joins my lookup table to my search in the search app:

index = sso | lookup StudentsFall2018.csv USER as user OUTPUT

And this next line of code allows me to use the Machine Learning Toolkit Predict Categorical Fields Classic Assistant in the MLTK app to predict the "withdrawn" column that is in the lookup table but only using fields to use for predicting that are also located in the same lookup table, obviously.

| inputlookup StudentsFall2018.csv

So theoretically I should be able to run the first line in the MLTK app to run a logic regression or another algorithm to predict the "withdrawn" field that is in the lookup table, but only now I should be able to use fields that are inside my events retrieved from my search (such as source or url etc.) instead of only being able to predict using fields that are only in the lookup table. But I cannot, the process runs, completes, returns events or whatnot just like it should but then when its time to select the field I want to predict, it is greyed out and will not allow me to select it. Ultimately I am trying to predict if a customer is likely to withdraw based on what company sites/pages/services the customer visits and how often they visit etc. using the url and other information that is contained in the resulting info in the events retrieved from the adjoining search. I don't need Splunk to predict based on information only in my lookup table, as I have been doing this already using R or SPSS or any other stats tool. Where Splunk differs from these other tools is the ability to analyze the customer's online behavior, if I could just figure out how. Thanks fro your help!

0 Karma

Splunk Employee
Splunk Employee

Sorry I was traveling and didn't see you had replied.

You should be able to write whatever SPL you want into the Assistant search bar and use the resulting events in your ML workflows. When you put
index = sso | lookup StudentsFall2018.csv USER as user OUTPUT
into a Splunk search bar in searching and reporting , what results do you get? you should get every event from index=sso enriched based on the key "USER" with the lookup StudentsFall2018.csv . But in the MLTK, when you run the search using just the search bar - do you get the same results? Does the lookup file StudentsFall2018.csv have permissions to be used INSIDE the MLTK? these are knowledge object questions (permissions) you need to check.

The only reason I can think of for "withdrawn" field to be greyed out in the Assistant is if it is entirely filled with null values, and if you are correctly merging the data in my examples above then this is not the case.

Do you have a way to share some of the data perhaps or can you contact your sales team to arrange a webex/remote enablement session?

0 Karma

New Member

I think we got it. There were some global permissions issues I had to get the admin to change for me. Also, I have to keep my time picker at 4hours or less, but it works now. Thanks for your help!

0 Karma

New Member

No worries, When I put index = sso | lookup StudentsFall2018.csv USER as user OUTPUT
into a Splunk search bar in searching and reporting it works fine events are returned that include my lookup columns as fields as it should.

When I run the same line in MLTK search bar I get the same results.

But when I run the same line in the Predict Assistant inside the MLTK it runs fine, matches events, finalizes job but will not let me choose my field to predict or fields to use to predict. all are greyed out.

I checked the permissions issue and I think this is the problem because I am not allowed to choose that the lookup shares to other apps, it is greyed out as well, lol. I don't have admin privs on enterprise. So I created another lookup named the same_2 and added it to MLTK but still nothing.

There are no nulls in withdrawn column.

The data is a bit sensitive so I'm not sure if I could. This is also a grad school project and the semester is just about up anyway. However, it would be pretty sweet to get it fixed by Monday if possible. I can perhaps set up a similar situation in my own install of Splunk and remote you in on that one.

0 Karma