Through a dbx query, I'm pulling out several columns, among which include account_email, true_ip, device_id, and request_id. I want to see only the events where any of these columns have duplicate values. For example, if 3 events have the account_email of test@gmail.com, then I'd want to see all 3 events, regardless of what's in the other columns. The same applies for true_ip, device_id, and request_id. How can I do this?
I tried something like this:
dbxquery=query_here | stats count by account_email, true_ip, account_telephone, device_id, request_id, session_id, digital_id | where count > 1
But this doesn't work because this would only extract occurrences where the combination of all those columns happened more than once. But as long as at least one of the columns have a duplicate occurrence, I'd want to see it.
Hi @retro-bloke
This is a perfect case for eventstats command. Here's an example of how to do it
| makeresults
| eval _raw="account_email,true_ip,device_id,request_id
bob@foobar.com,1.1.1.1,abc,123
sid@foobar.com,1.1.1.2,def,456
bob@foobar.com,1.1.1.2,ghi,789
bob@foo.com,1.1.1.3,jkl,999
"
| multikv forceheader=1
``` ^^^ create dummy events ^^^ ```
| eventstats count AS dups:account_email BY account_email
| eventstats count AS dups:true_ip BY true_ip
| eventstats count AS dups:device_id BY device_id
| eventstats count AS dups:request_id BY request_id
| addtotals dups*
| where Total > 4
| table account_email,true_ip,device_id,request_id
Hope that helps
Hi @retro-bloke
This is a perfect case for eventstats command. Here's an example of how to do it
| makeresults
| eval _raw="account_email,true_ip,device_id,request_id
bob@foobar.com,1.1.1.1,abc,123
sid@foobar.com,1.1.1.2,def,456
bob@foobar.com,1.1.1.2,ghi,789
bob@foo.com,1.1.1.3,jkl,999
"
| multikv forceheader=1
``` ^^^ create dummy events ^^^ ```
| eventstats count AS dups:account_email BY account_email
| eventstats count AS dups:true_ip BY true_ip
| eventstats count AS dups:device_id BY device_id
| eventstats count AS dups:request_id BY request_id
| addtotals dups*
| where Total > 4
| table account_email,true_ip,device_id,request_id
Hope that helps
Nailed it @yeahnah , thanks!!
Next challenge, is it possible to group the table of results by their common field?