If I look at this long enough, I'm sure I'll eventually figure it out, but that could be a whole month at my current rate....
I've been tasked with trying to calculate the number of unique alerts over the month of February from our notable index. The raw events is a little over a thousand spread across over 30+ alerts. However, we have a considerable amount of duplicates in that pile. I need to somehow intelligently dedup those 1000+ notable events using the unique fields in each alert type so I see only the number of actual unique events for each alert type.
As it is, I've been attempting to solve this using the map command as shown below
index=notable search_name=*
| search [| inputlookup approved_detections.csv]
| map maxsearches=1 search="| inputlookup monthly_alert_metrics.csv
| where alert_name=\"$search_name$\"
| stats count BY alert_name"
In this above search, I query the entire notable index and then filter out all the events whose search_name field is not found within the approved_detections.csv
This is where I'm left with roughly 1000+ events from different alerts firing. Then I try to use the map command . The second lookup table 'monthly_alert_metrics.csv' is a lookup I made containing each of the unique alert types(a little over 30) with what fields for each type I'd want to dedup that specific alert type on. An example is shown below:
alert_name,val1,val2,val3,val4
virus_alert,host,src_ip,threat_signature,mac
This example shows a mock alert "virus_alert" with the four fields I'd want to dedup on(host, src_ip, threat_signature, mac.
Let me know if I get this right: the unique combination in each row of monthly_alert_metrics.csv (i.e., not just alert_name) represents an alert type; you want to know how many of such unique combinations are there in a month; however, some rows may not have all val1, val2, val3, val4 populated. Is this correct?
First, if you really, really need to use dedup, know that it has an keepempty option (Optional arguments). If the above is what you want, this will do:
index=notable search_name=*
[| inputlookup approved_detections.csv] ``` no need for a second search command ```
| dedup keepempty=true search_name val1 val2 val3
| stats countBut dedup often is slower than stats. You can simply fillnull any missing field.
index=notable search_name=*
[| inputlookup approved_detections.csv]
| fillnull value=MISSING search_name val1 val2 val3
| stats count by search_name val1 val2 val3
| stats countindex=notable search_name=*
NOT [| inputlookup approved_detections.csv]
| fillnull value=MISSING search_name val1 val2 val3
| stats count by search_name val1 val2 val3
| stats count
Bottom line: map is usually not the answer.
Hope this helps.
My apologies, let me try to explain better then. The 1000+ events found in the notable index are composed of roughly 30 different unique alerts. A lot of those 1000+ events are duplicates wherein throttling isn't configured properly on an alert, and it triggers multiple times for what is ultimately the same activity.
I need to get rid of the duplicates for each type of alert(roughly 30+). For example, a few of the alert types might be:
-virus alert(with fields like host, user, signature, action, file_path)
-wrong_password(with fields like host, user, time, logon_type)
-suspicious _download(with fields like host, user, url, and file_name)
Using those above three to continue my example, the virus_alert might have fired 60 times in the past month, meaning there's 60 events in the notable index for that alert. However, in reality, there's really only 30 unique incidents because the alert fires twice for each occurrence. This means I need to dedup the alert to filter out the duplicates. To do this, I could write the following search
index=notable search_name="virus_alert"
| dedup signature, host, action
Continuing, the wrong_password alert might have triggered 17 times in the past month, but it's all really only 1 unique incident that triggered the alert 17 times. I might write the below search to filter out the duplicates
index=notable search_name="wrong_password"
| dedup user host
Now, if I hated myself, I could try and do this manually for each of the 30+ alert types to get the true number of unique incidents which occurred over the last month, writing a manual search followed by a custom dedup for each. Suffice to say, I'd rather avoid that.
I'd rather use a CSV lookup table(monthly_alert_metrics.csv) which I've populated with the names of each alert(field alert_name) followed by the unique fields I want to dedup that specific alert type on(fields val1, val2, val3....). I've attached an example of what this table looks like to this post(had to change file extension to .txt).
From there, I'd hoped I could use map to iterate through each alert type and dedup it according to the values within the CSV.
Hopefully this clears it up.
Thank you for explaining the dataset and logic. So, my speculation was not far off. The difference is that I didn't realize that names var1, var2, etc., are your choice, not field names carried in data.
The solution is the same. Just use the field names in data. For the three types, you can do
index=notable search_name=*
[| inputlookup approved_detections.csv]
| fillnull value=MISSING action file_name host signature url user
| stats count by search_name action file_name host signature url user
| stats countNow, there is a chance that the actual field names are very diverse and difficult to maintain in code. In that case, you may want to maintain that monthly_alert_matrics.csv instead. If that is the case, you can easily maintain the above code by using this search
| inputlookup monthly_alert_metrics.csv
| foreach val*
[eval everyfield = mvappend(everyfield, <<FIELD>>)]
| stats values(everyfield) as everyfield
| eval everyfield = mvjoin(everyfield, ",")Then, copy the value of everyfield into the code.