I've read up on delete and am familiar with the implications, but I'm having trouble figuring out how to mark events for deletion that are found in another index. The idea is very simple, but doesn't work. I'm basically trying to build a master index of unique IDs based on a daily incremental update of changes and additions. Similarly, I have a log file that indicates deleted records and I'd like to join those log results and pipe to delete to clean out my reference index.
index=pgbs | join type=inner Id [search index=pgbs-incremental] | delete
index=pgbs | join type=inner Id [search index=pgbs-audit extracted_EventType="Delete Entity"] | delete
Unfortunately it seems that delete cannot be invoked after join...
Error in 'delete' command: This command cannot be invoked after the non-streaming command 'join'.
You can only pipe raw events to delete so try this:
index=pgbs [search index=pgbs-incremental | fields Id] | delete
And this:
index=pgbs [search index=pgbs-audit extracted_EventType="Delete Entity" | fields Id] | delete
Be aware that delete
does almost nothing useful other than prevent events from ever showing up in search results.
Try this instead:
index=pgbs [search index=pgbs-incremental | fields Id]
If it works, then add the | delete
on the end. The limitation here is that subsearches have a default limit of 10,000 results. So you won't be able to delete more than 10,000 events at a time. But you could run this multiple times, choosing a smaller time range each time.
You can only pipe raw events to delete so try this:
index=pgbs [search index=pgbs-incremental | fields Id] | delete
And this:
index=pgbs [search index=pgbs-audit extracted_EventType="Delete Entity" | fields Id] | delete
Be aware that delete
does almost nothing useful other than prevent events from ever showing up in search results.
Thanks! It isn't obvious to me why the syntax works, but it does. The alternative was that dedup would have to sift through more and more and more events. Thanks again!
Thanks to lguinn as well, who mentioned the same and added the reminder of the subsearch limits. I had considered the same and I have incorporated your guidance into my process documentation.
I told you why it does/not work. You can only delete events which means that you cannot delete non-events. Think about it: once you pass events into a transforming
non-streaming
command, you are no longer working with events.
Yes, sorry. I understood your explanation why deleting after a join wasn't valid. What I didn't initially understand is why "index=pgbs [search index=pgbs-incremental | fields Id]" finds the records that I'm looking for since there isn't an explicit match on the Id field. I think the answer is that this syntax creates a free text search on the ID values, right? And if I happened to have another field "Previous ID" or "Reference ID" or even just a completely unrelated field with a random match, it would delete that record too, right?
It is a correlating
subsearch:
http://docs.splunk.com/Documentation/Splunk/6.2.4/Search/Usesubsearchtocorrelateevents
Holy cow! I know I'm very new to Splunk but I can't believe I haven't seen that yet, especially with all the reading up I did on the join command. That certainly allays my fears. Thanks for the reference!
Also, take a look at the search job inspector when you run the search. It will often show you the "expansion" of the subsearch, and shows a lot of other useful info about your search performance.
As I think about the syntax, is this a free text search that could potentially match the returned list of IDs against fields in pgbs other than ID?