I'm trying to display a total count for each value found in attributes.eventtype field and group them by the attributes.campaignname field. I'm display these stats from 2 specified values in attributes.campaignname:
index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421" OR attributes.campaignname="O365 Re-authentication - FY21Q3"
| spath output=eventtype attributes.eventtype
| dedup id
| stats count(eval(eventtype="Data Submission")) AS Data_Submission, count(eval(eventtype="Email Click")) AS Email_Click, count(eval(eventtype="Email View")) AS Email_View, , count(eval(eventtype="No Action")) AS No_Action, count(eval(eventtype="TM Complete")) AS TM_Complete, count(eval(eventtype="TM Sent")) AS TM_Sent BY attributes.campaignname
| addtotals
When running the search, I'm receiving smaller counts on each of the values in eventtype for one of the campaigns specified, "Undelivered Phishing Campaign - FY21Q2 - 062421"
If I only specify this campaign in my search, I'm able to receive back the expected total count on each of the values:
index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421"
| spath output=eventtype attributes.eventtype
| dedup id
| stats count(eval(eventtype="Data Submission")) AS Data_Submission, count(eval(eventtype="Email Click")) AS Email_Click, count(eval(eventtype="Email View")) AS Email_View, , count(eval(eventtype="No Action")) AS No_Action, count(eval(eventtype="TM Complete")) AS TM_Complete, count(eval(eventtype="TM Sent")) AS TM_Sent BY attributes.campaignname
| addtotals
Please help me to make this search working properly. Thank you in advance.
May not be relevant, but as soon as you do a dedup, you may remove data that should not be removed.
The fact that you are doing a dedup id, implies that there is more than one event per id. Are you sure that the dedup is not removing some events that may have the eventtype you are looking for, or can the 'id' be common to different campaignname? If so, and your data looks like
eventtype=Data Submission, id=1 campaingname=O365 Re-authentication - FY21Q3
eventtype=Data Submission, id=1 campaingname=Undelivered Phishing Campaing - FY21Q2 - 062421
then the dedup id will remove the second row. Similarly the second row would be removed in this case too.
eventtype=No Action, id=1 campaingname=Undelivered Phishing Campaing - FY21Q2 - 062421
eventtype=Data Submission, id=1 campaingname=Undelivered Phishing Campaing - FY21Q2 - 062421
Also, please note that although it may not be an issue here, the field name eventtype should not be used, as it may conflict with Splunk's event types
https://docs.splunk.com/Documentation/Splunk/8.2.2/Knowledge/Abouteventtypes
Choose a different name, just to be sure.
To help you diagnose your issue, try the following search
index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421" OR attributes.campaignname="O365 Re-authentication - FY21Q3"
| spath output=eventtype attributes.eventtype
| stats count BY attributes.campaignname eventtype
but try it WITH and WITHOUT the dedup id in there before the stats, which should help you understand your data. Also, you can add in 'id' as another 'by' field in the stats to see the numbers. That should help you understand the effect of dedup.
An alternative to
I ran your search noted above and you're correct; the dedup is removing events with the same id found in the 2nd campaign. I also should have specified in my original post as to why I'm using dedup in the first place. There are duplicate events in each of the 2 campaigns that I'm trying to filter out, so I'm trying to figure out a way to dedup the events in each campaign without effecting other campaigns. It’s also likely additional campaigns will be added in the future to this search
Normally you can use stats values(*) as * by X
to do deduping and X can be whatever grouping you need to get the correct info. values(*) as * will aggregate all fields to their same name, but you may then find multi-value fields that occur, that will depend on your data.
Try one of these:
index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421"
| spath output=eventtype attributes.eventtype
| stats count(eval(eventtype="Data Submission")) AS Data_Submission, count(eval(eventtype="Email Click")) AS Email_Click, count(eval(eventtype="Email View")) AS Email_View, , count(eval(eventtype="No Action")) AS No_Action, count(eval(eventtype="TM Complete")) AS TM_Complete, count(eval(eventtype="TM Sent")) AS TM_Sent BY attributes.campaignname
| addtotals
OR
index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421"
| spath output=eventtype attributes.eventtype
| chart dc(id) by attributes.campaignname eventtype
| addtotals
This seems to almost work, however, I am able to find duplicates for some of the values in the eventtype field
It may not make a difference, but I suggest using parentheses to make sure the base search is interpreted correctly.
index=mail sourcetype="phish-campaign-logs" (attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421" OR attributes.campaignname="O365 Re-authentication - FY21Q3")
...