Splunk Search

Filtering with dedup depending on number of results

timmalos
Communicator

I got a search that monitores my Netbackup jobs in real time.

search = index=Infra_NB sourcetype="NbJobs" site=$site$  (NOT HiddenByOperator="*")| fillnull value="-" Client jobCopy Policy Schedule jobFileList| dedup Client Policy Schedule jobFileList sortby -_time|dedup jobId sortby -_time  | search jobStatus>1 jobStatus!=150| sort -_time |table Type Date site Client Policy Status

I have between 2 and 20 results most time, but when there are a lot of troubles the list can grow a lot. What I would is add another dedup to the search (| dedup Client Policy Schedule sortby -_time)
only if there is more than 20 results.

I tried in this direction:

|eventstats count | eval test=if(count>20,DEDUP,nothing)

Thanks for your help,

Tags (3)
0 Karma
1 Solution

Ayn
Legend

I imagine you could achieve something like this using a combination of eventstats and streamstats.

... | eventstats count as totalcount | streamstats count as dcount by Client,Policy,Schedule | where count>20 AND dcount<2

eventstats gets the total count of events and streamstats assigns a running count for each combination of Client,Policy,Schedule. where then checks if the condition that count>20 has been met and if so it filters the events where dcount<2, that is, only 1 event per the combinatioin you want to dedup on.

View solution in original post

Ayn
Legend

I imagine you could achieve something like this using a combination of eventstats and streamstats.

... | eventstats count as totalcount | streamstats count as dcount by Client,Policy,Schedule | where count>20 AND dcount<2

eventstats gets the total count of events and streamstats assigns a running count for each combination of Client,Policy,Schedule. where then checks if the condition that count>20 has been met and if so it filters the events where dcount<2, that is, only 1 event per the combinatioin you want to dedup on.

View solution in original post

Ayn
Legend

Ah yes, most definitely, I wrote the search off the top of my head so it was bound to have bugs in it right from the start - great that you got it working!

0 Karma

timmalos
Communicator

Thanks a lot. This is the corrected version (and I replaced where by search since not needed and as far I know its better using search when possible) :

|eventstats count|streamstats count as dcount by Client,Policy,Schedule|search (count>20 AND dcount<2) OR count<=20

0 Karma

rtadams89
Contributor

There isn't really any programmatic logic built-in to Splunk search commands to do this, but there still may be a way to accomplish your end goal.

What are you doing with the results returned? Displaying them on a custom dashboard, creating a PDF report, alerting/emailing them, ... ? Why do you want to ad the second dedup only when the results are more than 20 (and not all the time)?

If I am visualizing your data correctly, the additional dedup command should only remove events with the same client/policy/schedule but the same jobFileList. Could you instead pipe your main search to | stats dc(jobFileList) by Client, Policy, Schedule or | stats values(jobFileList) by Client, Policy, Schedule to get a more acceptable format all the time?

0 Karma

timmalos
Communicator

Thanks for your help, I used Ayn suggestion. To answer, results are returned on a custom dashboard with some custom Jquery and CSS displayed in real-time, and when there are no many errors we want all errors for one client (We use all the screen space) but when there are lot of errors I dont want to differentiate by jobFileList but display more Clients in error.

(In fact I dont even display the FileList field in the final table, but I see there are many errors for one client or many clients which have at least one error (I know I'll not have a good day in this case ^^)

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!