Splunk Enterprise

Deletion of splunk sourcetype data

uagraw01
Motivator

Hello Splunkers !!

I hope all is well.


There are some sourcetypes in splunk which are having large amount of data but we are not using those sourcetypes in any of the dashboards or saved searches. I want to delete those sourcetypes in splunk and I have some questions associated with the deletion of sourcetype as below.

1. What is the best approach to delete the sourcetypes data in splunk ( using the delete command or from backend )
2. Does the deletion of historical data from those sourcetypes which impact the other useful sourcetype?
3. Does it impact on the corruption of the buckets ?
4. Unused sourcetypes is carrying millions of data. So what will be the fastest approach to delete the large historical data chunks ?

Thanks in advance. Advice and suggestions are really appreciated !!

Labels (1)
0 Karma

jawahir007
Communicator

1.Using Delete Command

 In Splunk, the delete command is used to mark events as deleted from search results. However, it does not physically remove the events from disk or from the index. Instead, it hides the marked events so they are not returned in future search results. The events are still present in the index but flagged as deleted

2.  Permanently Delete Data via Index Cleanup (Retention Policies)

To physically delete data from Splunk's indexes, you typically rely on index retention policies. Splunk automatically deletes older data based on index size or time-based retention policies.

Set Index Retention Policies:

  • Maximum Size (based on disk usage): Once the index exceeds a defined size, Splunk will delete the oldest data.
  • Time-based Retention: Splunk can automatically remove data that is older than a specific period (e.g., data older than 30 days).

Steps:

  1. Modify the indexes.conf file, located in $SPLUNK_HOME/etc/system/local/indexes.conf or within an app-specific folder.

  2. Example configuration for size- or time-based retention:

     
    [your_index]
    maxTotalDataSizeMB = 5000      # Set the maximum size of the index in MB
    frozenTimePeriodInSecs = 2592000  # 30 days in seconds (30 * 24 * 60 * 60)
    
    • maxTotalDataSizeMB: Sets the maximum disk space the index can use. When this limit is reached, older data is deleted.
    • frozenTimePeriodInSecs: Specifies the number of seconds to retain the data. Once the data is older than this, it will be deleted.
  3. After the index reaches the size or time threshold, old data is deleted automatically by Splunk.

uagraw01
Motivator

@richgalloway @jawahir007 

Thank you both for the nice explanation. 

As part of my migration activity, I want to clean up or remove all the unnecessary sourcetypes from Splunk so that we may use less disk space and move data more quickly from the old server to the new one. But as per your suggestion, delete command will never reduce disk space and in migration the entire data will have to be copied. Am I understanding it correctly ?

Some more addition on my first ask.

1. All the sourcetypes coming from one source.

2. All the sourcetypes belongs to only one index.

3. We are using transforms and props to build the sourcetypes. When a particular type of pattern events comes; then transforms create the sourcetype( as mentioned regex inside )

4. All the parsing and filtering will take care by python script.

5. Both unnecessary and necessary sourcetypes are included in that one index.

 

Thanks 

 

0 Karma

uagraw01
Motivator

 

l

 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Individual sourcetypes cannot be deleted.  Data is deleted by the bucket, which is a subset of an index.  When a bucket is deleted, all events in that bucket are removed from the system.

The delete command does not delete data.  It merely hides it from view.

There is no backend command to delete data.

If you are fortunate, the undesired sourcetypes are the only ones in their respective indexes.  In that case you can set the frozenTimePeriodInSecs for the index(es) to 1 and wait for Splunk to delete the buckets in the index(es).

If you are like most sites and have a mixture of sourcetypes in your indexes then it becomes more of a challenge.  One option:

  1. Copy the sourcetypes you wish to keep into a different index using the collect command.  This will impact your ingestion license.
  2. Set frozenTimePeriodInSecs on the original index to 1 and wait for buckets to be deleted.  This will delete everything in the index.  On-prem environments can use the clean CLI command to delete the index.
  3. Revert the frozenTimePeriodInSecs setting.
  4. Use the collect command to copy the desired data back to the original index.  This avoids having to change the queries that use that index name and will impact your ingestion license (again).  In an on-prem environment, you can rename the index to the original name.

See https://docs.splunk.com/Documentation/Splunk/9.3.0/Indexer/RemovedatafromSplunk#Remove_all_data_from... for more information.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...