I am trying to display the single total count of all messages for the current day that have the same values for a single field. I had a previous search that did just that and displayed it against the total messages but there is a question on the counts, so I need to display the single duplicate count to overcome any uncertainty.
source="/msg.stats/tcp/999" | where message_type="Log" | eventstats count as total | eventstats count as duplicates by log_id| stats count(eval(duplicates>1)) as no_duplicates values(total) as total | eval dupPercentage=no_duplicates/total*100
I thought this was working but a team member disputes the dupe stats as being too high. I assumed a simple search to display dupes would clear this up.
source="/msg.stats/tcp/999" | where message_type="Log" | stats count(eval(log_id>1)) as no_duplicates
I want to make sure that I have the searches correct so I can confidently accept the counts.
Thanks everyone for your response; before I accept an answer, the other query that the panel has that disagrees with the non dupe count my query and most of your queries all produce:
source="/msg.stats/tcp/999" | where message_type="Log" | stats dc(log_id) as UniqueCount
This produces a count of for example 753119 while the non dupe count from the first query and some of the other queries below produces 398320. As you can see a big difference. will dig deeper thanks
Remember, that dc is going to include one copy of each of your duplicate messages as well as the non-duplicated ones.
source="/msg.stats/tcp/999" message_type="Log"
| eventstats count as duplicates by log_id
| eval ImADuplicate = if(duplicates>1,1,0)
| eventstats dc(log_id) as UniqueDupCount by ImADuplicate
| eval UniqueNonDupCount = nullif(UniqueDupCount*(1-ImADuplicate),0)
| eval UniqueDupCount = nullif(UniqueDupCount*ImADuplicate,0)
| stats count as TotCount,
first(UniqueDupCount) as UniqueDupCount,
first(UniqueNonDupCount) as UniqueNonDupCount,
sum(ImADuplicate) as DupCount,
dc(log_id) as UniqueCount
| eval NonDupCount = UniqueCount - UniqueDupCount
| eval ExtraCount = DupCount - UniqueDupCount
| table TotCount UniqueCount NonDupCount UniquenonDupCount UniqueDupCount DupCount ExtraCount
Slight mod to get UniqueDupCount correct.
wow; I am brain-dead right now, forgive me. Did not realize it would be so difficult get a simple duplicate total count and unique total count.
So if I understand correctly: using some simple numbers. total 1000 messages delivered for today.
100 of these messages have duplicates which would mean a minimum of 200 total duplicate messages if 2 occurrences of the message for a given log_id.
that means 800 of the total do not have dupes.
and finally the total unique messages would be 900 which is the unique + the count of those that have dupes. Correct?
dc(log_id) would give me the total unique count of events that include those that have dupes but only include that event once in the total count - 900
| eventstats count as duplicates by log_id
| eval ImADuplicate = if(duplicates>1,1,0) this should give the me the dupes?
Really appreciate it you taking the time to clarify this.
Try like this
source="/msg.stats/tcp/999" | where message_type="Log"
| stats count by log_id | eval notDup=if(count=1,1,0)
| stats sum(count) as total sum(notDup) as notDup
| eval no_duplicates=total-notDup
| eval dupPercentage=no_duplicates*100/total
Thanks appreciate your response.
Do something like this to look at some examples of your duplicates, and see if there's another problem of some sort -
source="/msg.stats/tcp/999" message_type="Log"
| join log_id
[search source="/msg.stats/tcp/999" message_type="Log"
| stats count as duplicates by log_id
| search duplicates > 1
| sort duplicates
| head 2
| table log_id]
it should be pretty obvious if they are non-dups that somehow got the same log_id, or if they are real duplicates.
Finally, in case there's something subtle about your code that isn't working, here's the way I'd code that percentage calculation -
source="/msg.stats/tcp/999" message_type="Log"
| eventstats count as total
| eventstats count as duplicates by log_id
| eval ImADuplicate = if(duplicates>1,1,0)
| stats count as EventCount, sum(ImADuplicate) as DupCount, first(total) as total
| eval NonDupCount = EventCount - DupCount
| eval DupPercentage=DupCount/EventCount*100
| eval NonDupPercentage=NonDupCount/EventCount*100
| table total EventCount DupCount NonDupCount DupPercentage NonDupPercentage
How about this query? Anything where count is greater than 1 has duplicates.
source="/msg.stats/tcp/999" message_type="Log" log_id > 1 | stats count by log_id | sort -count