Deployment Architecture

delete/clean data for a specific time period

dhsetty
Explorer

Need help for cleaning/deleting data for a particular time period. Data need to be permanently removed along with clearing space as well for a time period, let us say from March 12th 2017 04:00 PM to March 12th 2017 07:00 PM. Its really required the time range deletion/clean. Please help to achieve this. In case there is no direct way, please suggest is there any other steps to do this. Thanks in advance.

Tags (1)
0 Karma

dhsetty
Explorer

Any other solutions or inputs for the above Query:

How to Delete/Clean Data for a specific time period...?

Even the Indexes to be deleted for the specific time period, how it can be done...?

0 Karma

puneethgowda
Communicator

Not sure if below answer may help you

You can check the duplicated events along with their tim of indexing with the below query:

index=your index sourcetype=your sourcetype | eval dup=_raw | convert ctime(_time) as T1 | convert ctime(_indextime) as indextime | transaction dup mvlist=t maxspan=1s keepevicted=true | table dup,source,sourcetype,host,index,indextime

Process to delete the duplicated events:

  1. Run the following command to store all duplicate events in a lookup table.

index=* sourcetype=wsa_accesslogs | eval id=_cd."|".index."|".splunk_server | transaction _raw maxspan=1s keepevicted=true mvlist=t | search

eventcount>1
| eval delete_id=mvindex(id, 1, -1) | stats c by delete_id | outputlookup delete_these.csv

  1. Once search finishes completely by running the following command you can view the events stored in lookup table | inputlookup delete_these.csv

Note: You need to wait till your search gets complete. You can use smart mode as well.
You can also check the newly created lookup table in the $Splunk_Home\etc\apps\app_name\lookups\ delete_these.csv

  1. Run the following command to delete all events from source type which also exists into lookup table (in your case its delete_these.csv)

index=* sourcetype=wsa_accesslogs | eval delete_id=_cd."|".index."|".splunk_server | search [|inputlookup delete_these.csv | fields delete_id |

format "(" "(" "OR" ")" "OR" ")"] | delete

0 Karma

dhsetty
Explorer

Tried "| delete" as below:

/data/third_party/splunk/bin/splunk SEARCH "starttime=04/19/2017:06:09:00 endtime=04/20/2017:23:59:00 | delete" -auth admin:changeme

But splunk data is not getting deleted within the duration. Any other suggestions?

0 Karma

somesoni2
Revered Legend

You've not provided any index/sourcetype in your search, so it'll try to delete stuffs from the indexes which you can search by default.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

The only way to reclaim space is to delete the buckets where the data is stored. This will delete ALL the data that is contained in that bucket. If you want to delete the data completely and reclaim the space, each and every bucket that contains the data will have to be removed, and if you have other data in those buckets, it will be lost as well.

I've been down this road before, and it is the ONLY way. You can try to go into the buckets and delete data, and it will corrupt the files. If you MODIFY data it will mess with the check sums of the files.

If you don't care about the other data, then deleting the buckets can be done by going into the index directory and delete the files that have the data. The dat of the oldest and newest events are found in the name of the bucket file. Delete those that have data in the time range you want to delete and you will accomplish your goal.

If you have a cluster with replication, this may be a bit more complicated, but I think that the buckets will be the same, just flagged differently (as replicants) on the other indexers.

mmahadh
New Member

Hey Carry,

Deleting the bucket being the only way out to reclaim space, I have some queries on buckets.
I'm not having the buckets in the "db_newesttime_oldesttime" naming convention.
All I have is "defaultdb" and "metaventdb" in my splunk which has .tsidx files and raw data.

Can I do a clean based on the .tsidx timestamp as well? or I need to delete the defaultdb/metaeventdb ?

Also the index names as mentioned in the document are "main", "_internal" and "_audit".
Is *.tsidx also the index?
How to identify a index directory?

If you can give an example of a bucket name, I could search for similar stuff in my splunk as well.

Thanks in advance.

Note: I'm using splunk version 3.4.13 as of now.

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

My experience with Splunk really started with 4.2, so I'm not familiar with the 3.4.x method of bucket naming. I wish that I could be of more help. I would suggest asking your question specifically in a new question here and see if there is someone with the answer you are looking for.

0 Karma

dhsetty
Explorer

thanks cpetterborg. I will try your suggestion and update.

0 Karma

dineshraj9
Builder

Add can_delete privileges to the user ID using which you want to delete the data.
Login with the above user ID and run your search for required duration and add " | delete" at the end.

Search for "March 12th 2017 04:00 PM to March 12th 2017 07:00 PM" | delete

dhsetty
Explorer

Hi Dineshraj,

Tried "| delete" as below:

/data/third_party/splunk/bin/splunk SEARCH "starttime=04/19/2017:06:09:00 endtime=04/20/2017:23:59:00 | delete" -auth admin:changeme

But no luck. It is not getting delete from the database...

0 Karma

dhsetty
Explorer

Thanks diheshraj for your quick reply. I have seen in the document that delete would not reclaim the space.

0 Karma

dhsetty
Explorer

after delete is there any mechanism to clear this deleted data only

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...