Complex - to me - report needed

bowesmana · ‎11-08-2013

I am doing some online course completion analysis and am trying to get a report which seems to me to be quite difficult to achieve. I have data from 2 sources, one containing the learners in the system and which company they belong to and the other a list of courses completed by those learners.

A simplified example of a learners record is

userid,name,learner_type,company,

and course completions is

course,userid,name,completed

There are 3 different learner types, "Direct sales", "Indirect sales", "Other" and I want to ignore Other in this report.

I have 2 sourcetypes,

learners
completions

What I want to get out is a report that shows

Course
Company
Number of passes
Total number of learners in the company
Percentage complete as total learners by # passes

It needs to group the report by course, so the course name is only reported once and there is a line for each of the companies with the 3 additional columns of data.

I have got as far as two types of splunk command, each of which goes part way to solve the problem. The first

sourcetype=completions learner_type!="other" | stats count(learner_type) as Passed by course, company | chart list(company) AS Company, list(Passed) as Passed by course

That gives me the courses and within each course there are 2 columns, 1 the company and 2, the number of courses passed for that company. Cool, but not enough...

The second uses transaction to group by course

sourcetype=completions learner_type!="other" | transaction course | sort course | table course, company

That gives me a table with course and companies, with just a single line for course.

So, the first is missing the count of learners for each company, which must somehow come from the learners source_type as well as the % complete.

The second is missing most stuff, but is an example of partially achieving what I wanted with the transaction command, which I had not used before.

I feel like I need to do an initial count of the learners for each company and pass that onto the rest of the search pipeline, but am now stuck on how to progress...

Any help much appreciated.

lguinn2 · ‎11-08-2013

There are several ways to approach this.

First approach: Use a lookup table. This method will scale nicely. It also leverage the fact that Splunk is well-suited to manipulate time-based data. That might not be important today, but... In this method, you continue to index the completions, but the learner information becomes a lookup table (which is not indexed).

If you place the learner data in a lookup table, you could have a learner.csv file that looks like this

userid,name,learner_type,company
djones,Dee Jones,employee,Big Data Inc.
etc.

Once you set up the lookup (tutorial here), you could run the search like this:

sourcetype=completions 
| lookup learnerLookup userid OUTPUT learner_type company
| where learner_type!="Other" 
| eventstats dc(userid) as LearnerCount by company
| stats count as Passed avg(LearnerCount) as Learners by course, company
| eval PercentComplete = round(Passed*100/Learners,2)

Second approach: Use a join Joins in Splunk are a little different than joins in SQL. It won't work if there are more than 10,000 results in the subsearch. For this, you would keep your learner data in Splunk as you do now.

sourcetype=completions 
| join userid [ search sourcetype=learners learner_type!="Other" | fields userid company ]
| eventstats dc(userid) as LearnerCount by company
| stats count as Passed avg(LearnerCount) as Learners by course, company
| eval PercentComplete = round(Passed*100/Learners,2)

A final point - how will you update your learner data in the future? Splunk does not allow you to update indexed data - you would have to delete it and then index new data. A lookup table can be reloaded at any time, without any impact on your Splunk license.

bowesmana · ‎11-08-2013

Thanks Iguinn, option 1 is not possible, the learners change all the time and a monthly snapshot of learners is taken and added to the index.

So, option 2 is taking me forward, but not giving me quite what I want. I need to look at why it is giving me what it is then come back to this, but thanks for far, that's really helpful.

Complex - to me - report needed

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Automated Threat Analysis: Available in ES Premier

What’s New in Splunk AI: Volume 02

Best Practices: Splunk auto adjust pipeline queue

Join the Conversation

Complex - to me - report needed

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Automated Threat Analysis: Available in ES Premier

What’s New in Splunk AI: Volume 02

Best Practices: Splunk auto adjust pipeline queue