Solved: How to exclude duplicate events based on a field v...

russell120 · ‎09-30-2019

Hi, I have an "asset discovery" type of query that uses a CSV and 4+ indexes, and produces tens of thousands of results look similar to this:

id     device     serial  origin
111  routerAlpha  12345   sales
     routerAlpha  12345
222  switchBeta   78787   devops
     switchBeta   78787   
    desktopSigma  44444   
    desktopSigma  44444   
    serverTheta   55555

How do I:
1) Remove all events that have a serial with a null id IF the same serial appears again in an event with a numerical id
2) Then, deduplicate all other events that have a serial # with a null id

The result should look like this:

id     device     serial  origin
111  routerAlpha  12345   sales
222  switchBeta   78787   devops
    desktopSigma  44444   
    serverTheta   55555

jacobpevans · ‎09-30-2019

Greetings @russell120,

This run-anywhere search should give you what you need. Fillnull is necessary so that stats does not drop the null entries. Then, you count the number of unique device rows. If there are multiple device rows AND id is null, then drop those. You can use whatever you want instead of "-", and you can use replace to change back to null if necessary at the end.

           | makeresults | eval id="111"   , device="routerAlpha" ,  serial="12345" , origin="sales"
| append [ | makeresults | eval id=null()  , device="routerAlpha" ,  serial=null()  , origin=null()   ]
| append [ | makeresults | eval id="222"   , device="switchBeta"  ,  serial="78787" , origin="devops" ]
| append [ | makeresults | eval id=null()  , device="switchBeta"  ,  serial="78787" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="serverTheta" ,  serial="55555" , origin=null()   ]
| fillnull value="-"
| stats      count by id device serial origin
| eventstats count by device
| search NOT(count>1 id="-")
| fields   - count
| sort                id device serial

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

jacobpevans · ‎09-30-2019

Greetings @russell120,

This run-anywhere search should give you what you need. Fillnull is necessary so that stats does not drop the null entries. Then, you count the number of unique device rows. If there are multiple device rows AND id is null, then drop those. You can use whatever you want instead of "-", and you can use replace to change back to null if necessary at the end.

           | makeresults | eval id="111"   , device="routerAlpha" ,  serial="12345" , origin="sales"
| append [ | makeresults | eval id=null()  , device="routerAlpha" ,  serial=null()  , origin=null()   ]
| append [ | makeresults | eval id="222"   , device="switchBeta"  ,  serial="78787" , origin="devops" ]
| append [ | makeresults | eval id=null()  , device="switchBeta"  ,  serial="78787" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="serverTheta" ,  serial="55555" , origin=null()   ]
| fillnull value="-"
| stats      count by id device serial origin
| eventstats count by device
| search NOT(count>1 id="-")
| fields   - count
| sort                id device serial

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

russell120 · ‎10-01-2019

Hey @jacobevans ,

Thanks! This works perfectly for the example I provided, but perhaps I simplified my real data a bit too much. My issue is that I have a multivalue field.

If origin was a multivalue field and the values sales and devops were replaced with sales marketing and devops tech, respectively, what edit would we need in your solution for it to work?

Since I have a multivalue field, the stats count by line (line 9) makes a 2nd copy of each event to separate the multivalues into their own a single value.

jacobpevans · ‎10-01-2019

Try inserting | mvexpand origin between these lines:

| fillnull value="-"
| stats      count by id device serial origin

So that it now looks like this:

            | makeresults | eval id="111"   , device="routerAlpha" ,  serial="12345" , origin="sales"
 | append [ | makeresults | eval id=null()  , device="routerAlpha" ,  serial=null()  , origin=null()   ]
 | append [ | makeresults | eval id="222"   , device="switchBeta"  ,  serial="78787" , origin="devops" ]
 | append [ | makeresults | eval id=null()  , device="switchBeta"  ,  serial="78787" , origin=null()   ]
 | append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
 | append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
 | append [ | makeresults | eval id=null()  , device="serverTheta" ,  serial="55555" , origin=null()   ]
 | fillnull value="-"
 | mvexpand origin
 | stats      count by id device serial origin
 | eventstats count by device
 | search NOT(count>1 id="-")
 | fields   - count
 | sort                id device serial

Without testing, I believe that should work.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

russell120 · ‎10-02-2019

Thanks a lot! I'll accept your original answer as the answer as it is the solution to the original question. I appreciate your help.

jacobpevans · ‎10-02-2019

You got it. Thank you for accepting the answer!

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

How to exclude duplicate events based on a field value in another event?

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

How to exclude duplicate events based on a field value in another event?

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...