We have a database server whose logs are pushed into Splunk. Those logs also contain userdata information like their login mail-id, phone numbers, personal mail-id for reference and such details.
Now when the user leaves the organisation or the contract we need these details to be deleted and move ur developer team is asking me for a backend API if possible where we can delete these sensitive data immediately so that his/her privacy is also maintained.
I suggested them to hash the user parameter but the db team wants those data to be sent to splunk for a record.
Can anyone suggest is there any API open from backend which we can give to the developers and achieve the task expected.
Thanks in advance for your replies.
Individual events (log entries) cannot be deleted, by API or any other means. They will be removed automatically by Splunk when their buckets age out.
Fields within events cannot be changed, either.
PII should be masked before it is indexed.
I downvoted this post because while technically true, this doesnt present the solution from the splunk search point of view.
@richgalloway is correct that it is impossible to delete anything from an index. Once it's indexed the data will stay in there until it rolls out and is deleted. The one thing that you could do is use the "delete" command that will do its best to make sure none of the "deleted" data will be used in searches.
See https://docs.splunk.com/Documentation/Splunk/7.0.2/SearchReference/Delete for more infomation about the delete command, http://docs.splunk.com/Documentation/Splunk/7.0.2/RESTTUT/RESTsearches on how to do searches via API.
So while what @richgalloway says is true, this isnt exactly the final answer.
While data isnt technically deleted till data roll occurs, you can use the | delete command to remove the events in question. What this does is mark the events as un-searchable until it is rolled out.
Implications here are as follows:
1) Data is not searchable any longer from the GUI / CLI / API ( results of the deleted events wont be returned )
2) Data still resides on disk, and if one had shell access on the box, they could get the indexed data from disk
3) delete can't be run from API per say...
See more on the delete command here : https://docs.splunk.com/Documentation/Splunk/7.0.2/SearchReference/Delete
Additionally, all PII data should be masked coming into Splunk. That is a best practice.