Splunk Search

How to exclude duplicate events based on a field value in another event?

russell120
Communicator

Hi, I have an "asset discovery" type of query that uses a CSV and 4+ indexes, and produces tens of thousands of results look similar to this:

id     device     serial  origin
111  routerAlpha  12345   sales
     routerAlpha  12345
222  switchBeta   78787   devops
     switchBeta   78787   
    desktopSigma  44444   
    desktopSigma  44444   
    serverTheta   55555   

How do I:
1) Remove all events that have a serial with a null id IF the same serial appears again in an event with a numerical id
2) Then, deduplicate all other events that have a serial # with a null id

The result should look like this:

id     device     serial  origin
111  routerAlpha  12345   sales
222  switchBeta   78787   devops
    desktopSigma  44444   
    serverTheta   55555   
0 Karma
1 Solution

jacobpevans
Motivator

Greetings @russell120,

This run-anywhere search should give you what you need. Fillnull is necessary so that stats does not drop the null entries. Then, you count the number of unique device rows. If there are multiple device rows AND id is null, then drop those. You can use whatever you want instead of "-", and you can use replace to change back to null if necessary at the end.

           | makeresults | eval id="111"   , device="routerAlpha" ,  serial="12345" , origin="sales"
| append [ | makeresults | eval id=null()  , device="routerAlpha" ,  serial=null()  , origin=null()   ]
| append [ | makeresults | eval id="222"   , device="switchBeta"  ,  serial="78787" , origin="devops" ]
| append [ | makeresults | eval id=null()  , device="switchBeta"  ,  serial="78787" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="serverTheta" ,  serial="55555" , origin=null()   ]
| fillnull value="-"
| stats      count by id device serial origin
| eventstats count by device
| search NOT(count>1 id="-")
| fields   - count
| sort                id device serial
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

0 Karma

jacobpevans
Motivator

Greetings @russell120,

This run-anywhere search should give you what you need. Fillnull is necessary so that stats does not drop the null entries. Then, you count the number of unique device rows. If there are multiple device rows AND id is null, then drop those. You can use whatever you want instead of "-", and you can use replace to change back to null if necessary at the end.

           | makeresults | eval id="111"   , device="routerAlpha" ,  serial="12345" , origin="sales"
| append [ | makeresults | eval id=null()  , device="routerAlpha" ,  serial=null()  , origin=null()   ]
| append [ | makeresults | eval id="222"   , device="switchBeta"  ,  serial="78787" , origin="devops" ]
| append [ | makeresults | eval id=null()  , device="switchBeta"  ,  serial="78787" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
| append [ | makeresults | eval id=null()  , device="serverTheta" ,  serial="55555" , origin=null()   ]
| fillnull value="-"
| stats      count by id device serial origin
| eventstats count by device
| search NOT(count>1 id="-")
| fields   - count
| sort                id device serial
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

russell120
Communicator

Hey @jacobevans ,

Thanks! This works perfectly for the example I provided, but perhaps I simplified my real data a bit too much. My issue is that I have a multivalue field.

If origin was a multivalue field and the values sales and devops were replaced with sales marketing and devops tech, respectively, what edit would we need in your solution for it to work?

Since I have a multivalue field, the stats count by line (line 9) makes a 2nd copy of each event to separate the multivalues into their own a single value.

0 Karma

jacobpevans
Motivator

Try inserting | mvexpand origin between these lines:

| fillnull value="-"
| stats      count by id device serial origin

So that it now looks like this:

            | makeresults | eval id="111"   , device="routerAlpha" ,  serial="12345" , origin="sales"
 | append [ | makeresults | eval id=null()  , device="routerAlpha" ,  serial=null()  , origin=null()   ]
 | append [ | makeresults | eval id="222"   , device="switchBeta"  ,  serial="78787" , origin="devops" ]
 | append [ | makeresults | eval id=null()  , device="switchBeta"  ,  serial="78787" , origin=null()   ]
 | append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
 | append [ | makeresults | eval id=null()  , device="desktopSigma",  serial="44444" , origin=null()   ]
 | append [ | makeresults | eval id=null()  , device="serverTheta" ,  serial="55555" , origin=null()   ]
 | fillnull value="-"
 | mvexpand origin
 | stats      count by id device serial origin
 | eventstats count by device
 | search NOT(count>1 id="-")
 | fields   - count
 | sort                id device serial

Without testing, I believe that should work.

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

russell120
Communicator

Thanks a lot! I'll accept your original answer as the answer as it is the solution to the original question. I appreciate your help.

0 Karma

jacobpevans
Motivator

You got it. Thank you for accepting the answer!

Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...