Splunk Search

How do you remove all duplicate events from a search?

I have two indexes, A and B. Events are copied using the |collect command from Index A to index B. Later, I am trying to run a search for all results in index A that are not in index B. Something like:

index=A NOT index B

However this does not remove an event that is in both indexes. Essentially what I am trying is a |join type=left outer. However it seems that Splunk doesn't support that type of join. |Dedup seems to not recognize the events as duplicates either. I also tried using _cd as a unique identifier, however since that is tied to its location in the index, the two events have different _cd values preventing that from being used.

EDIT:

We currently are trying to allow users of our dashboard to "acknowledge" events. This process currently means filling in some input that sets tokens, which, on a drilldown action on a panel that has that event, runs a new search using |eval to append those tokens and |collect to move that event into index B.
The idea is that then we could make sure the our search for "to be acknowledged" will NOT include events that are in index B. Currently do to getting these issues I have been testing without the |eval bit, meaning that the two events are the exact same, the only difference is the index

0 Karma

Splunk Employee
Splunk Employee

You should be able to use dedup, but you want to pass it a particular list of fields (instead of the whole _raw event that is the default).

0 Karma

Dedup keeps at minimum 1 duplicated event, I want there to be no events that match, if there was an event that matched, the duplicated event and the original event should not show in the search

0 Karma

Splunk Employee
Splunk Employee

Ok then try :

  <mysearch> |stats count by _raw | where count=1
0 Karma

So that still gives me one of the duplicated events, same as using Dedup

0 Karma

Splunk Employee
Splunk Employee

count=1 means that there is no duplicate, the event is unique.

0 Karma

What I mean is just like with Dedup, I am getting one event, when what I want is neither of the events to appear. If any event matches with an event in the other index, I want neither to appear.

0 Karma

Path Finder

If your events have a unique ID field then you use a search to filter the events.

index=A NOT [search index=B | table ID]

If you do not have an ID field you can simply list all the fields.

index=A NOT [search index=B | table Field1, Field2, Field3]

If there is no difference between the events you can also do this.

index=A NOT [search index=B]
0 Karma

Unfortunately my events do not have a unique ID, I tried your second suggestion and listed out several fields, however that returned no results at all.

0 Karma

Path Finder

Make sure you are not using operators before the search. For example, you posted this
index=A AND source="wineventlog:setup" _raw!=[search index=B AND source="wineventlog:setup" | dedup _raw]

But try this instead
index=A AND source="wineventlog:setup" [search index=B AND source="wineventlog:setup" | dedup _raw | table _raw]

0 Karma

The search you suggest returns no results at all.

0 Karma

Path Finder

Okay there must be something else going on then, I do sub searches like this quite frequently though not with _raw. Check out the docs here for more info on sub searches:
https://docs.splunk.com/Documentation/SplunkCloud/7.0.3/SearchTutorial/Useasubsearch

0 Karma

Communicator

Are the events from same sourcetype? Raw data have same format? I am trying to replicate and I am having in that use case.

0 Karma

I have made sure to use the source and sourcetype arguments to specify these events to have the same types as the originals in index A. The _raw fields are the same as well I believe (they don't get changed by the |collect command)

0 Karma

Communicator

@strickland123456789 If you use field aliases and use the same alias for the two fields, deduping the field alias, will solve your problem.
http://docs.splunk.com/Documentation/Splunk/7.1.3/Knowledge/Abouttagsandaliases

0 Karma

The problem with using |Dedup is that it will leave the original duplicate still in the search results. I want neither the event in index A nor index B to show up if a match exists. |Dedup leaves the one in index A.

0 Karma

Communicator

Yes, I was trying to reply of that, but something happened!!!

You are right, dedup is not an option because it will pick the first ocurrency for the field.

Have you tried something like this??

index=A  fieldA!=
[search index=B
| dedup fieldB
| format ]
| .... etc

It will exclude ocurrencys of the fieldB in matches in fieldA

0 Karma

I'm forced to Dedup and != on _raw, so I tried this command:
index=A AND source="wineventlog:setup" _raw!=[search index=B AND source="wineventlog:setup" | dedup _raw]

(I'm using the source just for testing, as the full indexes could contain millions of events, these sources contain <100)

However this gives me an error:
Comparator '!=' has an invalid term on the right hand side
Then it lists out the exact duplicated event. This seems to be close.

0 Karma

Communicator

Sorry, my mistake. When you format, you receive an already string formated. And in without format, still will not work that why.

It will be like how said @charlesmcdonald.

Why it need to be in _raw? Why not use rex? Can you paste an event?

0 Karma

_raw seems to be the only field that I can be sure will be unique besides its duplicated event. I could use rex, but in general my events will come from a variety of types/events/logs/os . I could paste an event but it wouldn't really mean anything as its just a generic event

0 Karma

SplunkTrust
SplunkTrust

@strickland123456789 what is the current query you are trying?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma