Solved: Complex search with set operation

bowesmana · ‎01-05-2014

I have 3 sourcetypes representing learners, courses and course completion details

learners - the learner details
catalog - the total course catalog for each learner
completions - the course completions for the learer

Amongst other fields,

"learners" contains Username, Name
"catalog" contains Username, Asset_Title
"completions" contains Username, Asset_Title

I want to produce a report showing each learner, the total number of courses in their catalog and the total number of courses they have completed along with a % completion. The following search almost gets there.

| set intersect [ search sourcetype=catalog* | fields Asset_Title, Username | fields - _* ] [ search sourcetype=completions* | fields Asset_Title, Username | fields - _* ] | eventstats dc(Asset_Title) as Count by Username | join Username [ search sourcetype=learners* ] | stats Count by Name

but I can't see how to get the total courses from the catalog for each learner to be retained as a field after the set operation, so I can add that as a Total field and calculate the %.

lguinn2 · ‎01-05-2014

Why not this? It seems much more simple

Username=* 
| stats count(eval(match(sourcetype,"catalog.*"))) as Catalog  count(eval(match(sourcetype,"completion.*"))) as Completed first(Name) as Name by Username

Although it looks like you expect to see duplicate Asset_Titles, so maybe this would work better

sourcetype=learners
| append [ search sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username ]
| append [ search sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username ]
| stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username
| eval PercentCompleted=round(Completed * 100 / Total, 2)
| table Name Username Completed Total PercentCompleted
| sort Name

I think either of these will run faster and be less complex than all the joins...

View solution in original post

lguinn2 · ‎01-05-2014

Why not this? It seems much more simple

Username=* 
| stats count(eval(match(sourcetype,"catalog.*"))) as Catalog  count(eval(match(sourcetype,"completion.*"))) as Completed first(Name) as Name by Username

Although it looks like you expect to see duplicate Asset_Titles, so maybe this would work better

sourcetype=learners
| append [ search sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username ]
| append [ search sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username ]
| stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username
| eval PercentCompleted=round(Completed * 100 / Total, 2)
| table Name Username Completed Total PercentCompleted
| sort Name

I think either of these will run faster and be less complex than all the joins...

bowesmana · ‎01-06-2014

Thanks, the 'big table' concept got it for me. I had tried the separate searches, but didn't understand the blank entries in the results after the appends, but when putting that into a table, I worked it out.

You've been a great help, you've got me over a mental hurdle. Thanks!

lguinn2 · ‎01-06-2014

Hopefully that helps. Or just run the first line of the search. Then run the first two lines of the search and see what happens. Then run the first 3 lines of the search... etc. etc.

lguinn2 · ‎01-06-2014

Try running each of these searches separately:

sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username

sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username

sourcetype=learners | table Username Name

Then think about how you would combine these into one big table - maintaining a row for each event (some columns would be empty).
Now, with your table, imagine doing the following stats command:

stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username

which creates a single row for each Username

bowesmana · ‎01-06-2014

Hi Iguinn, one thing I don't understand about Splunk is how the append works to append the CatalogCount and CompletionCount values to the right learner result. How is it that the CatalogCount by Username will get correctly associated with the relevant Username record from the sourcetype=learners.

I'm still missing something fundamental in how the search pipeline works I think. Can you clarify how this works.

Thanks!

lguinn2 · ‎01-06-2014

Bah - I just figured out another problem with my first search...

lguinn2 · ‎01-06-2014

Sorry about the typo in the append command - I corrected my answer. Thanks!

bowesmana · ‎01-05-2014

Thanks iguinn! The first didn't work - I actually have a couple of variants of sourcetype, i.e. I have completions_standard and completions_script, so I normally use completions* but the stats command doesn't like count(sourcetype=completions*).

However, the second seems to work - although it needed the 'search' in the append subsearch, i.e.

append [ search sourcetype...

Complex search with set operation

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Splunk Asynchronous Forwarding Explained

55 Days to Go: Secure Your Seat at Splunk University in Denver

(re)Introducing the Splunk Community Champions + 2026 – 2027 Splunk MVPs ...

Join the Conversation