Splunk Search

dedup only when values match in two fields

CharterBT
Explorer

Here's an interesting problem. I need to write a query where Splunk removes an event when two specific values in a found event match. For example, a mocked-up sample of my results shows this:

0.0.0.0 test
0.0.0.0 pass
0.0.0.0 pass

I'd like Splunk to only remove the second instance of "0.0.0.0 pass" while keeping the first instance as well as the "0.0.0.0 test" in my results.

Is there an easy way to do this? If it helps, the field name for the numbers is src and for the words is cs5. Any help is appreciated.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

Dedup should be able to do this. If you post a little more of your end game, there maybe a more optimized approach. Do you want counts of how many times this happens? etc.

your_search | dedup 2 src

http://docs.splunk.com/Documentation/Splunk/6.0/SearchReference/Dedup

0 Karma

lukejadamec
Super Champion

To find the version, from Splunkweb in the upper right, click About.

0 Karma

CharterBT
Explorer

Not sure the version... but it's not 6, yet. Thanks for the tips. I'll try them and let you know how it goes.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

Thats where stats count by cs5 src works a little faster. stats is done at the indexer, dedup is done at the search head. dedup src cs5 should be doing the same thing according to the docs. what version are you using?

CharterBT
Explorer

One other thing. I tried "dedup src, cs5", but it didn't retain any new "src" records after it found its first duplicate src value. I need the dedup to be a little smarter and only remove duplicate entries of the src/cs5 combination.

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

Then a better search is: your_search | stats dc(cs5) as DistinctInfections by src. This gives you each individual source and how many different infections they have over the time range. If you want how many of each infection per src, do your_search | stats count by cs5 src.

CharterBT
Explorer

No, I don't need to know how many times it repeats.

Each # value is a computer, and each word value is a type of malware. Some computers have multiple infections, so I just need to remove the instances where that computer/malware combination has already been identified. My search is covering a month-long timeframe, so I don't need to count every time it shows up, just that it did at some point.

Does that help?

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...