Need help for cleaning/deleting data for a particular time period. Data need to be permanently removed along with clearing space as well for a time period, let us say from March 12th 2017 04:00 PM to March 12th 2017 07:00 PM. Its really required the time range deletion/clean. Please help to achieve this. In case there is no direct way, please suggest is there any other steps to do this. Thanks in advance.
Any other solutions or inputs for the above Query:
How to Delete/Clean Data for a specific time period...?
Even the Indexes to be deleted for the specific time period, how it can be done...?
Not sure if below answer may help you
You can check the duplicated events along with their tim of indexing with the below query:
index=your index sourcetype=your sourcetype | eval dup=_raw | convert ctime(_time) as T1 | convert ctime(_indextime) as indextime | transaction dup mvlist=t maxspan=1s keepevicted=true | table dup,source,sourcetype,host,index,indextime
Process to delete the duplicated events:
index=* sourcetype=wsa_accesslogs | eval id=_cd."|".index."|".splunk_server | transaction _raw maxspan=1s keepevicted=true mvlist=t | search
eventcount>1
| eval delete_id=mvindex(id, 1, -1) | stats c by delete_id | outputlookup delete_these.csv
Note: You need to wait till your search gets complete. You can use smart mode as well.
You can also check the newly created lookup table in the $Splunk_Home\etc\apps\app_name\lookups\ delete_these.csv
index=* sourcetype=wsa_accesslogs | eval delete_id=_cd."|".index."|".splunk_server | search [|inputlookup delete_these.csv | fields delete_id |
format "(" "(" "OR" ")" "OR" ")"] | delete
Tried "| delete" as below:
/data/third_party/splunk/bin/splunk SEARCH "starttime=04/19/2017:06:09:00 endtime=04/20/2017:23:59:00 | delete" -auth admin:changeme
But splunk data is not getting deleted within the duration. Any other suggestions?
You've not provided any index/sourcetype in your search, so it'll try to delete stuffs from the indexes which you can search by default.
The only way to reclaim space is to delete the buckets where the data is stored. This will delete ALL the data that is contained in that bucket. If you want to delete the data completely and reclaim the space, each and every bucket that contains the data will have to be removed, and if you have other data in those buckets, it will be lost as well.
I've been down this road before, and it is the ONLY way. You can try to go into the buckets and delete data, and it will corrupt the files. If you MODIFY data it will mess with the check sums of the files.
If you don't care about the other data, then deleting the buckets can be done by going into the index directory and delete the files that have the data. The dat of the oldest and newest events are found in the name of the bucket file. Delete those that have data in the time range you want to delete and you will accomplish your goal.
If you have a cluster with replication, this may be a bit more complicated, but I think that the buckets will be the same, just flagged differently (as replicants) on the other indexers.
Hey Carry,
Deleting the bucket being the only way out to reclaim space, I have some queries on buckets.
I'm not having the buckets in the "db_newesttime_oldesttime" naming convention.
All I have is "defaultdb" and "metaventdb" in my splunk which has .tsidx files and raw data.
Can I do a clean based on the .tsidx timestamp as well? or I need to delete the defaultdb/metaeventdb ?
Also the index names as mentioned in the document are "main", "_internal" and "_audit".
Is *.tsidx also the index?
How to identify a index directory?
If you can give an example of a bucket name, I could search for similar stuff in my splunk as well.
Thanks in advance.
Note: I'm using splunk version 3.4.13 as of now.
My experience with Splunk really started with 4.2, so I'm not familiar with the 3.4.x method of bucket naming. I wish that I could be of more help. I would suggest asking your question specifically in a new question here and see if there is someone with the answer you are looking for.
thanks cpetterborg. I will try your suggestion and update.
Add can_delete privileges to the user ID using which you want to delete the data.
Login with the above user ID and run your search for required duration and add " | delete" at the end.
Search for "March 12th 2017 04:00 PM to March 12th 2017 07:00 PM" | delete
Hi Dineshraj,
Tried "| delete" as below:
/data/third_party/splunk/bin/splunk SEARCH "starttime=04/19/2017:06:09:00 endtime=04/20/2017:23:59:00 | delete" -auth admin:changeme
But no luck. It is not getting delete from the database...
Thanks diheshraj for your quick reply. I have seen in the document that delete would not reclaim the space.
after delete is there any mechanism to clear this deleted data only