Splunk Search

Dedup different types of events by different fields

dtaylor
Path Finder

If I look at this long enough, I'm sure I'll eventually figure it out, but that could be a whole month at my current rate....

 

I've been tasked with trying to calculate the number of unique alerts over the month of February from our notable index. The raw events is a little over a thousand spread across over 30+ alerts. However, we have a considerable amount of duplicates in that pile. I need to somehow intelligently dedup those 1000+ notable events using the unique fields in each alert type so I see only the number of actual unique events for each alert type.

 

As it is, I've been attempting to solve this using the map command as shown below

index=notable search_name=*
| search [| inputlookup approved_detections.csv]
| map maxsearches=1 search="| inputlookup monthly_alert_metrics.csv
| where alert_name=\"$search_name$\"
| stats count BY alert_name"

 

In this above search, I query the entire notable index and then filter out all the events whose search_name field is not found within the approved_detections.csv

This is where I'm left with roughly 1000+ events from different alerts firing. Then I try to use the map command . The second lookup table 'monthly_alert_metrics.csv' is a lookup I made containing each of the unique alert types(a little over 30) with what fields for each type I'd want to dedup that specific alert type on. An example is shown below:

alert_name,val1,val2,val3,val4
virus_alert,host,src_ip,threat_signature,mac

 

This example shows a mock alert "virus_alert" with the four fields I'd want to dedup on(host, src_ip, threat_signature, mac.

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Let me know if I get this right: the unique combination in each row of monthly_alert_metrics.csv (i.e., not just alert_name) represents an alert type; you want to know how many of such unique combinations are there in a month; however, some rows may not have all val1, val2, val3, val4 populated.  Is this correct?

First, if you really, really need to use dedup, know that it has an keepempty option (Optional arguments).  If the above is what you want, this will do:

index=notable search_name=*
 [| inputlookup approved_detections.csv] ``` no need for a second search command ```
| dedup keepempty=true search_name val1 val2 val3
| stats count

But dedup often is slower than stats.  You can simply fillnull any missing field.

index=notable search_name=*
 [| inputlookup approved_detections.csv]
| fillnull value=MISSING search_name val1 val2 val3
| stats count by search_name val1 val2 val3
| stats count
  1. If my interpretation is correct, I cannot find any use of monthly_alert_metrix.csv.
  2. Your description stats "whose search_name field is not found within the approved_detections.csv", but your code does the opposite.  If you truly mean "NOT found within" (as opposed to "found within"), 
index=notable search_name=*
  NOT [| inputlookup approved_detections.csv]
| fillnull value=MISSING search_name val1 val2 val3
| stats count by search_name val1 val2 val3
| stats count

 

Bottom line: map is usually not the answer.

Hope this helps.

0 Karma

dtaylor
Path Finder

My apologies, let me try to explain better then. The 1000+ events found in the notable index are composed of roughly 30 different unique alerts. A lot of those 1000+ events are duplicates wherein throttling isn't configured properly on an alert, and it triggers multiple times for what is ultimately the same activity.


I need to get rid of the duplicates for each type of alert(roughly 30+). For example, a few of the alert types might be:

-virus alert(with fields like host, user, signature, action, file_path)

-wrong_password(with fields like host, user, time, logon_type)

-suspicious _download(with fields like host, user, url, and file_name)

 

Using those above three to continue my example, the virus_alert might have fired 60 times in the past month, meaning there's 60 events in the notable index for that alert. However, in reality, there's really only 30 unique incidents because the alert fires twice for each occurrence. This means I need to dedup the alert to filter out the duplicates. To do this, I could write the following search


index=notable search_name="virus_alert"
| dedup signature, host, action

 

Continuing, the wrong_password alert might have triggered 17 times in the past month, but it's all really only 1 unique incident that triggered the alert 17 times. I might write the below search to filter out the duplicates

index=notable search_name="wrong_password"
| dedup user host

 

Now, if I hated myself, I could try and do this manually for each of the 30+ alert types to get the true number of unique incidents which occurred over the last month, writing a manual search followed by a custom dedup for each. Suffice to say, I'd rather avoid that.

 

I'd rather use a CSV lookup table(monthly_alert_metrics.csv) which I've populated with the names of each alert(field alert_name) followed by the unique fields I want to dedup that specific alert type on(fields val1, val2, val3....). I've attached an example of what this table looks like to this post(had to change file extension to .txt).

 

From there, I'd hoped I could use map to iterate through each alert type and dedup it according to the values within the CSV.

 

Hopefully this clears it up.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Thank you for explaining the dataset and logic.  So, my speculation was not far off.  The difference is that I didn't realize that names var1, var2, etc., are your choice, not field names carried in data.

The solution is the same.  Just use the field names in data.  For the three types, you can do

index=notable search_name=*
 [| inputlookup approved_detections.csv]
| fillnull value=MISSING action file_name host signature url user
| stats count by search_name action file_name host signature url user
| stats count

Now, there is a chance that the actual field names are very diverse and difficult to maintain in code.  In that case, you may want to maintain that monthly_alert_matrics.csv instead.  If that is the case, you can easily maintain the above code by using this search

| inputlookup monthly_alert_metrics.csv
| foreach val*
    [eval everyfield = mvappend(everyfield, <<FIELD>>)]
| stats values(everyfield) as everyfield
| eval everyfield = mvjoin(everyfield, ",")

Then, copy the value of everyfield into the code.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...