I am trying to display the single total count of all messages for the current day that have the same values for a single field. I had a previous search that did just that and displayed it against the total messages but there is a question on the counts, so I need to display the single duplicate count to overcome any uncertainty.
source="/msg.stats/tcp/999" | where message_type="Log" | eventstats count as total | eventstats count as duplicates by log_id| stats count(eval(duplicates>1)) as no_duplicates values(total) as total | eval dupPercentage=no_duplicates/total*100
I thought this was working but a team member disputes the dupe stats as being too high. I assumed a simple search to display dupes would clear this up.
source="/msg.stats/tcp/999" | where message_type="Log" | stats count(eval(log_id>1)) as no_duplicates
I want to make sure that I have the searches correct so I can confidently accept the counts.
Thanks everyone for your response; before I accept an answer, the other query that the panel has that disagrees with the non dupe count my query and most of your queries all produce:
source="/msg.stats/tcp/999" | where message_type="Log" | stats dc(log_id) as UniqueCount
This produces a count of for example 753119 while the non dupe count from the first query and some of the other queries below produces 398320. As you can see a big difference. will dig deeper thanks
Remember, that dc is going to include one copy of each of your duplicate messages as well as the non-duplicated ones.
source="/msg.stats/tcp/999" message_type="Log" | eventstats count as duplicates by log_id | eval ImADuplicate = if(duplicates>1,1,0) | eventstats dc(log_id) as UniqueDupCount by ImADuplicate | eval UniqueNonDupCount = nullif(UniqueDupCount*(1-ImADuplicate),0) | eval UniqueDupCount = nullif(UniqueDupCount*ImADuplicate,0) | stats count as TotCount, first(UniqueDupCount) as UniqueDupCount, first(UniqueNonDupCount) as UniqueNonDupCount, sum(ImADuplicate) as DupCount, dc(log_id) as UniqueCount | eval NonDupCount = UniqueCount - UniqueDupCount | eval ExtraCount = DupCount - UniqueDupCount | table TotCount UniqueCount NonDupCount UniquenonDupCount UniqueDupCount DupCount ExtraCount
Slight mod to get UniqueDupCount correct.
wow; I am brain-dead right now, forgive me. Did not realize it would be so difficult get a simple duplicate total count and unique total count.
So if I understand correctly: using some simple numbers. total 1000 messages delivered for today.
100 of these messages have duplicates which would mean a minimum of 200 total duplicate messages if 2 occurrences of the message for a given log_id.
that means 800 of the total do not have dupes.
and finally the total unique messages would be 900 which is the unique + the count of those that have dupes. Correct?
dc(log_id) would give me the total unique count of events that include those that have dupes but only include that event once in the total count - 900
| eventstats count as duplicates by log_id
| eval ImADuplicate = if(duplicates>1,1,0) this should give the me the dupes?
Really appreciate it you taking the time to clarify this.
Try like this
source="/msg.stats/tcp/999" | where message_type="Log" | stats count by log_id | eval notDup=if(count=1,1,0) | stats sum(count) as total sum(notDup) as notDup | eval no_duplicates=total-notDup | eval dupPercentage=no_duplicates*100/total
Do something like this to look at some examples of your duplicates, and see if there's another problem of some sort -
source="/msg.stats/tcp/999" message_type="Log" | join log_id [search source="/msg.stats/tcp/999" message_type="Log" | stats count as duplicates by log_id | search duplicates > 1 | sort duplicates | head 2 | table log_id]
it should be pretty obvious if they are non-dups that somehow got the same log_id, or if they are real duplicates.
Finally, in case there's something subtle about your code that isn't working, here's the way I'd code that percentage calculation -
source="/msg.stats/tcp/999" message_type="Log" | eventstats count as total | eventstats count as duplicates by log_id | eval ImADuplicate = if(duplicates>1,1,0) | stats count as EventCount, sum(ImADuplicate) as DupCount, first(total) as total | eval NonDupCount = EventCount - DupCount | eval DupPercentage=DupCount/EventCount*100 | eval NonDupPercentage=NonDupCount/EventCount*100 | table total EventCount DupCount NonDupCount DupPercentage NonDupPercentage