Splunk Search

Why am I getting incorrect stats count in my search?

surly78
Loves-to-Learn Lots

I'm trying to display a total count for each value found in attributes.eventtype field and group them by the attributes.campaignname field. I'm display these stats from 2 specified values in attributes.campaignname:

 

index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421" OR attributes.campaignname="O365 Re-authentication - FY21Q3"
| spath output=eventtype attributes.eventtype
| dedup id
| stats count(eval(eventtype="Data Submission")) AS Data_Submission, count(eval(eventtype="Email Click")) AS Email_Click, count(eval(eventtype="Email View")) AS Email_View, , count(eval(eventtype="No Action")) AS No_Action, count(eval(eventtype="TM Complete")) AS TM_Complete, count(eval(eventtype="TM Sent")) AS TM_Sent BY attributes.campaignname
| addtotals

 



When running the search, I'm receiving smaller counts on each of the values in eventtype for one of the campaigns specified, "Undelivered Phishing Campaign - FY21Q2 - 062421"

stat-cnt-multi-campaign.png

 



If I only specify this campaign in my search, I'm able to receive back the expected total count on each of the values: 

 

 

index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421"
| spath output=eventtype attributes.eventtype
| dedup id
| stats count(eval(eventtype="Data Submission")) AS Data_Submission, count(eval(eventtype="Email Click")) AS Email_Click, count(eval(eventtype="Email View")) AS Email_View, , count(eval(eventtype="No Action")) AS No_Action, count(eval(eventtype="TM Complete")) AS TM_Complete, count(eval(eventtype="TM Sent")) AS TM_Sent BY attributes.campaignname
| addtotals

 


stat-cnt-single-campaign.png

Please help me to make this search working properly. Thank you in advance.

Labels (3)
Tags (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

May not be relevant, but as soon as you do a dedup, you may remove data that should not be removed.

The fact that you are doing a dedup id, implies that there is more than one event per id. Are you sure that the dedup is not removing some events that may have the eventtype you are looking for, or can the 'id' be common to different campaignname? If so, and your data looks like

eventtype=Data Submission, id=1 campaingname=O365 Re-authentication - FY21Q3
eventtype=Data Submission, id=1 campaingname=Undelivered Phishing Campaing - FY21Q2 - 062421

then the dedup id will remove the second row. Similarly the second row would be removed in this case too.

eventtype=No Action, id=1 campaingname=Undelivered Phishing Campaing - FY21Q2 - 062421
eventtype=Data Submission, id=1 campaingname=Undelivered Phishing Campaing - FY21Q2 - 062421

Also, please note that although it may not be an issue here, the field name eventtype should not be used, as it may conflict with Splunk's event types

https://docs.splunk.com/Documentation/Splunk/8.2.2/Knowledge/Abouteventtypes

Choose a different name, just to be sure.

To help you diagnose your issue, try the following search

index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421" OR attributes.campaignname="O365 Re-authentication - FY21Q3"
| spath output=eventtype attributes.eventtype
| stats count BY attributes.campaignname eventtype

but try it WITH and WITHOUT the dedup id in there before the stats, which should help you understand your data. Also, you can add in 'id' as another 'by' field in the stats to see the numbers. That should help you understand the effect of dedup. 

An alternative to 

0 Karma

surly78
Loves-to-Learn Lots

I ran your search noted above and you're correct; the dedup is removing events with the same id found in the 2nd campaign.  I also should have specified in my original post as to why I'm using dedup in the first place.  There are duplicate events in each of the 2 campaigns that I'm trying to filter out, so I'm trying to figure out a way to dedup the events in each campaign without effecting other campaigns. It’s also likely additional campaigns will be added in the future to this search

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Normally you can use stats values(*) as * by X

to do deduping and X can be whatever grouping you need to get the correct info. values(*) as * will aggregate all fields to their same name, but you may then find multi-value fields that occur, that will depend on your data.

 

0 Karma

somesoni2
Revered Legend

Try one of these:

index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421"
| spath output=eventtype attributes.eventtype
| stats count(eval(eventtype="Data Submission")) AS Data_Submission, count(eval(eventtype="Email Click")) AS Email_Click, count(eval(eventtype="Email View")) AS Email_View, , count(eval(eventtype="No Action")) AS No_Action, count(eval(eventtype="TM Complete")) AS TM_Complete, count(eval(eventtype="TM Sent")) AS TM_Sent BY attributes.campaignname
| addtotals

 

OR 

index=mail sourcetype="phish-campaign-logs" attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421"
| spath output=eventtype attributes.eventtype
| chart dc(id) by attributes.campaignname eventtype
| addtotals
0 Karma

surly78
Loves-to-Learn Lots

This seems to almost work, however, I am able to find duplicates for some of the values in the eventtype field

0 Karma

richgalloway
SplunkTrust
SplunkTrust

It may not make a difference, but I suggest using parentheses to make sure the base search is interpreted correctly.

index=mail sourcetype="phish-campaign-logs" (attributes.campaignname="Undelivered Phishing Campaign - FY21Q2 - 062421" OR attributes.campaignname="O365 Re-authentication - FY21Q3")
...
---
If this reply helps you, an upvote would be appreciated.
0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!