I got a search that monitores my Netbackup jobs in real time.
search = index=Infra_NB sourcetype="NbJobs" site=$site$ (NOT HiddenByOperator="*")| fillnull value="-" Client jobCopy Policy Schedule jobFileList| dedup Client Policy Schedule jobFileList sortby -_time|dedup jobId sortby -_time | search jobStatus>1 jobStatus!=150| sort -_time |table Type Date site Client Policy Status
I have between 2 and 20 results most time, but when there are a lot of troubles the list can grow a lot. What I would is add another dedup to the search (| dedup Client Policy Schedule sortby -_time)
only if there is more than 20 results.
I tried in this direction:
|eventstats count | eval test=if(count>20,DEDUP,nothing)
Thanks for your help,
I imagine you could achieve something like this using a combination of eventstats
and streamstats
.
... | eventstats count as totalcount | streamstats count as dcount by Client,Policy,Schedule | where count>20 AND dcount<2
eventstats
gets the total count of events and streamstats
assigns a running count for each combination of Client,Policy,Schedule. where
then checks if the condition that count>20
has been met and if so it filters the events where dcount<2
, that is, only 1 event per the combinatioin you want to dedup on.
I imagine you could achieve something like this using a combination of eventstats
and streamstats
.
... | eventstats count as totalcount | streamstats count as dcount by Client,Policy,Schedule | where count>20 AND dcount<2
eventstats
gets the total count of events and streamstats
assigns a running count for each combination of Client,Policy,Schedule. where
then checks if the condition that count>20
has been met and if so it filters the events where dcount<2
, that is, only 1 event per the combinatioin you want to dedup on.
Ah yes, most definitely, I wrote the search off the top of my head so it was bound to have bugs in it right from the start - great that you got it working!
Thanks a lot. This is the corrected version (and I replaced where by search since not needed and as far I know its better using search when possible) :
|eventstats count|streamstats count as dcount by Client,Policy,Schedule|search (count>20 AND dcount<2) OR count<=20
There isn't really any programmatic logic built-in to Splunk search commands to do this, but there still may be a way to accomplish your end goal.
What are you doing with the results returned? Displaying them on a custom dashboard, creating a PDF report, alerting/emailing them, ... ? Why do you want to ad the second dedup only when the results are more than 20 (and not all the time)?
If I am visualizing your data correctly, the additional dedup command should only remove events with the same client/policy/schedule but the same jobFileList. Could you instead pipe your main search to | stats dc(jobFileList) by Client, Policy, Schedule
or | stats values(jobFileList) by Client, Policy, Schedule
to get a more acceptable format all the time?
Thanks for your help, I used Ayn suggestion. To answer, results are returned on a custom dashboard with some custom Jquery and CSS displayed in real-time, and when there are no many errors we want all errors for one client (We use all the screen space) but when there are lot of errors I dont want to differentiate by jobFileList but display more Clients in error.
(In fact I dont even display the FileList field in the final table, but I see there are many errors for one client or many clients which have at least one error (I know I'll not have a good day in this case ^^)