Splunk Search

Complex search with set operation

bowesmana
SplunkTrust
SplunkTrust

I have 3 sourcetypes representing learners, courses and course completion details

learners - the learner details
catalog - the total course catalog for each learner
completions - the course completions for the learer

Amongst other fields,

"learners" contains Username, Name
"catalog" contains Username, Asset_Title
"completions" contains Username, Asset_Title

I want to produce a report showing each learner, the total number of courses in their catalog and the total number of courses they have completed along with a % completion. The following search almost gets there.

| set intersect [ search sourcetype=catalog* | fields Asset_Title, Username | fields - _* ] [ search sourcetype=completions* | fields Asset_Title, Username | fields - _* ] | eventstats dc(Asset_Title) as Count by Username | join Username [ search sourcetype=learners* ] | stats Count by Name

but I can't see how to get the total courses from the catalog for each learner to be retained as a field after the set operation, so I can add that as a Total field and calculate the %.

Tags (2)
0 Karma
1 Solution

lguinn2
Legend

Why not this? It seems much more simple

Username=* 
| stats count(eval(match(sourcetype,"catalog.*"))) as Catalog  count(eval(match(sourcetype,"completion.*"))) as Completed first(Name) as Name by Username

Although it looks like you expect to see duplicate Asset_Titles, so maybe this would work better

sourcetype=learners
| append [ search sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username ]
| append [ search sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username ]
| stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username
| eval PercentCompleted=round(Completed * 100 / Total, 2)
| table Name Username Completed Total PercentCompleted
| sort Name

I think either of these will run faster and be less complex than all the joins...

View solution in original post

lguinn2
Legend

Why not this? It seems much more simple

Username=* 
| stats count(eval(match(sourcetype,"catalog.*"))) as Catalog  count(eval(match(sourcetype,"completion.*"))) as Completed first(Name) as Name by Username

Although it looks like you expect to see duplicate Asset_Titles, so maybe this would work better

sourcetype=learners
| append [ search sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username ]
| append [ search sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username ]
| stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username
| eval PercentCompleted=round(Completed * 100 / Total, 2)
| table Name Username Completed Total PercentCompleted
| sort Name

I think either of these will run faster and be less complex than all the joins...

bowesmana
SplunkTrust
SplunkTrust

Thanks, the 'big table' concept got it for me. I had tried the separate searches, but didn't understand the blank entries in the results after the appends, but when putting that into a table, I worked it out.

You've been a great help, you've got me over a mental hurdle. Thanks!

0 Karma

lguinn2
Legend

Hopefully that helps. Or just run the first line of the search. Then run the first two lines of the search and see what happens. Then run the first 3 lines of the search... etc. etc.

0 Karma

lguinn2
Legend

Try running each of these searches separately:

sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username

sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username

sourcetype=learners | table Username Name

Then think about how you would combine these into one big table - maintaining a row for each event (some columns would be empty).
Now, with your table, imagine doing the following stats command:

stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username

which creates a single row for each Username

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Hi Iguinn, one thing I don't understand about Splunk is how the append works to append the CatalogCount and CompletionCount values to the right learner result. How is it that the CatalogCount by Username will get correctly associated with the relevant Username record from the sourcetype=learners.

I'm still missing something fundamental in how the search pipeline works I think. Can you clarify how this works.

Thanks!

0 Karma

lguinn2
Legend

Bah - I just figured out another problem with my first search...

0 Karma

lguinn2
Legend

Sorry about the typo in the append command - I corrected my answer. Thanks!

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Thanks iguinn! The first didn't work - I actually have a couple of variants of sourcetype, i.e. I have completions_standard and completions_script, so I normally use completions* but the stats command doesn't like count(sourcetype=completions*).

However, the second seems to work - although it needed the 'search' in the append subsearch, i.e.

append [ search sourcetype...
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Deep Dive: Accelerate threat investigation with Splunk’s AI Assistant in Security

AI is one of the biggest topics in the market today, and for security teams, its value goes far beyond the ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Detection Engineering Office Hours: Real-World Troubleshooting & Q&A

[REGISTER HERE] This thread is for the Community Office Hours session on Detection Engineering Office Hours: ...