need query to remove duplicates from count stats
Sample input
event email
abc xyz@email.com
abc xyz@email.com
abc. test@email.com
abc. test@email.com
xyz xyz@email.com
Expected output
event | count |
abc | 2 |
xyz | 1 |
what I am getting
event | count |
abc | 4 |
xyz | 1 |
It is good that you try to illustrate input and desired output. But you forget to tell us what you are trying to count that should either be 4 or 2? In other words, you need to explain the logic between input and desired output fully and explicitly.
If I take a wild mind reading, you want to count unique number of E-mails related to each type of event. You want to use distinctcount or dc, not count.
| stats dc(email) as count by event
Here's an emulation of your mock input
| makeresults format=csv data="_raw
abc xyz@email.com
abc xyz@email.com
abc. test@email.com
abc. test@email.com
xyz xyz@email.com"
| rex "(?<event>\w+)\W+(?<email>\S+)"
``` data emulation above ```
The output is
event | count |
abc | 2 |
xyz | 1 |
It is good that you try to illustrate input and desired output. But you forget to tell us what you are trying to count that should either be 4 or 2? In other words, you need to explain the logic between input and desired output fully and explicitly.
If I take a wild mind reading, you want to count unique number of E-mails related to each type of event. You want to use distinctcount or dc, not count.
| stats dc(email) as count by event
Here's an emulation of your mock input
| makeresults format=csv data="_raw
abc xyz@email.com
abc xyz@email.com
abc. test@email.com
abc. test@email.com
xyz xyz@email.com"
| rex "(?<event>\w+)\W+(?<email>\S+)"
``` data emulation above ```
The output is
event | count |
abc | 2 |
xyz | 1 |