Hello Splunkers !!
I hope all is well.
There are some sourcetypes in splunk which are having large amount of data but we are not using those sourcetypes in any of the dashboards or saved searches. I want to delete those sourcetypes in splunk and I have some questions associated with the deletion of sourcetype as below.
1. What is the best approach to delete the sourcetypes data in splunk ( using the delete command or from backend )
2. Does the deletion of historical data from those sourcetypes which impact the other useful sourcetype?
3. Does it impact on the corruption of the buckets ?
4. Unused sourcetypes is carrying millions of data. So what will be the fastest approach to delete the large historical data chunks ?
Thanks in advance. Advice and suggestions are really appreciated !!
1.Using Delete Command
In Splunk, the delete command is used to mark events as deleted from search results. However, it does not physically remove the events from disk or from the index. Instead, it hides the marked events so they are not returned in future search results. The events are still present in the index but flagged as deleted
2. Permanently Delete Data via Index Cleanup (Retention Policies)
To physically delete data from Splunk's indexes, you typically rely on index retention policies. Splunk automatically deletes older data based on index size or time-based retention policies.
Modify the indexes.conf file, located in $SPLUNK_HOME/etc/system/local/indexes.conf or within an app-specific folder.
Example configuration for size- or time-based retention:
[your_index] maxTotalDataSizeMB = 5000 # Set the maximum size of the index in MB frozenTimePeriodInSecs = 2592000 # 30 days in seconds (30 * 24 * 60 * 60)
After the index reaches the size or time threshold, old data is deleted automatically by Splunk.
Thank you both for the nice explanation.
As part of my migration activity, I want to clean up or remove all the unnecessary sourcetypes from Splunk so that we may use less disk space and move data more quickly from the old server to the new one. But as per your suggestion, delete command will never reduce disk space and in migration the entire data will have to be copied. Am I understanding it correctly ?
Some more addition on my first ask.
1. All the sourcetypes coming from one source.
2. All the sourcetypes belongs to only one index.
3. We are using transforms and props to build the sourcetypes. When a particular type of pattern events comes; then transforms create the sourcetype( as mentioned regex inside )
4. All the parsing and filtering will take care by python script.
5. Both unnecessary and necessary sourcetypes are included in that one index.
Thanks
l
Individual sourcetypes cannot be deleted. Data is deleted by the bucket, which is a subset of an index. When a bucket is deleted, all events in that bucket are removed from the system.
The delete command does not delete data. It merely hides it from view.
There is no backend command to delete data.
If you are fortunate, the undesired sourcetypes are the only ones in their respective indexes. In that case you can set the frozenTimePeriodInSecs for the index(es) to 1 and wait for Splunk to delete the buckets in the index(es).
If you are like most sites and have a mixture of sourcetypes in your indexes then it becomes more of a challenge. One option:
See https://docs.splunk.com/Documentation/Splunk/9.3.0/Indexer/RemovedatafromSplunk#Remove_all_data_from... for more information.