Getting Data In
Highlighted

We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

Motivator

We are currently running out of space in one Splunk indexer out of 5 indexers in our distributed environment. Using Splunk 6.2.1 Version.
Total size of the indexer volume is about 5.2TB. Currently we are left out with less then 100 GB of space and everyday an average of 10GB of space is occupied. The data that is occupying space is almost 3.5 year old data. and most of the data is present under the colddb storage unit under the mount point /splogs.

Disk Usage status

df -h /splogs
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_splunk03_san-splunk_logs
                      5.6T  5.3T   93G  99% /splogs

We could find most of the space is occupied by these indexes.

[net_proxy], [net_fw], [unix_svrs] & [unix_bsm] 

Example:

[root@splunk03 splogs]# cd unix_svrs
[root@splunk03 unix_svrs]# ls -ltr
total 416
drwx------    2 splunk splunk   4096 Apr 19  2012 thaweddb
drwx------ 1590 splunk splunk 102400 Aug  6 09:18 colddb
drwx------ 1890 splunk splunk 131072 Aug  6 12:51 summary
drwx------ 1893 splunk splunk 143360 Aug  6 12:53 datamodel_summary
drwx------  307 splunk splunk  28672 Aug  6 12:54 db
[root@splunk03 unix_svrs]# du -sh *
1007G   colddb
1.6G    datamodel_summary
229G    db
366M    summary
4.0K    thaweddb

[root@splunk03 splogs]# cd net_fw
[root@splunk03 net_fw]# ls -ltr
total 612
drwx------    2 splunk splunk   4096 Apr 19  2012 thaweddb
drwx------ 1358 splunk splunk 131072 Sep 27  2015 summary
drwx------ 2956 splunk splunk 180224 Aug  6 12:17 colddb
drwx------ 3258 splunk splunk 266240 Aug  6 12:55 datamodel_summary
drwx------  313 splunk splunk  28672 Aug  6 12:55 db
[root@splunk03 net_fw]# du -sh *
**1.3T**    colddb
76G     datamodel_summary
147G    db
24M     summary
4.0K    thaweddb

Indexes.conf details for these indexers

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[default]
frozenTimePeriodInSecs = 31536000

[net_fw]
homePath = volume:Hot/net_fw/db
coldPath = volume:Cold/net_fw/colddb
tstatsHomePath = volume:Hot/net_fw/datamodel_summary
thawedPath = $SPLUNK_DB/net_fw/thaweddb
maxTotalDataSizeMB = 250000

[unix_svrs]
homePath = volume:Hot/unix_svrs/db
coldPath = volume:Cold/unix_svrs/colddb
tstatsHomePath = volume:Hot/unix_svrs/datamodel_summary
thawedPath = $SPLUNK_DB/unix_svrs/thaweddb
maxTotalDataSizeMB = 250000

[summary]
frozenTimePeriodInSecs = 188697600

There are other indexers configured in the same manner as shown above in Indexes.conf.

Kindly let me know whether we can delete the data that are present under the colddb directory for the indexer occupying more than 1TB. By doing this, what will be the impact? Or is there any other method we can prevent the failure of the splunk service due to low disk space?

0 Karma
Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

SplunkTrust
SplunkTrust

Whether you can remove old buckets or not depends on whether you need the data in those buckets or not - we can't help you there.

That being said, taking a look at your config I have a few pointers.
Are your indexers sharing that 5.2TB? If so, are all five indexers writing into the same path? That's looking for trouble.
Doing the maths suggests this is the case, each indexer is configured to consume up to 250GB for each of those indexes. Multiplied by five that's 1.25TB for each index - both currently are at about 1.25TB.
You should see old buckets being removed all the time - search index=_internal component=bucketmover idx=unix_svrs or the other index... if you're at the maximum configured space, Splunk will throw out oldest buckets on its own and the size should not grow further.
If you need more space for other indexes AND have figured out that you can throw out more old data, you could reduce maxTotalDataSizeMB on the indexers a bit. Then they'll throw out more old buckets. Just deleting buckets while Splunk is using them is again looking for trouble.

Another point, you've configured a year of data retention. Do check if your disk is large enough to make it to one year, assuming that year is based on compliance "must store a year" rather than privacy "cannot store more than a year".

Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

Motivator

thank Martin for throwing some lights on this issue.

We have 5 individual physical server for splunk indexer instance with space of 5.6TB configured in each indexer instance, which in turn pointed to separate mount point /splogs in each indexer instances. Now the problem is only with one of indexer instance which is running out of space.

I had executed above query as mentioned in comment, but not sure how to check whether the old data are being removed or not?

maxTotalDataSizeMB 250 GB is mentioned only for few indexes in the Indexes.conf file, but how to check whether data are being deleted after reaching the 250GB mark? I mean how and where to check the frozen bucket and there is no archive policy being set in the Indexes.conf file to archive the data.

There are two frozenTimePeriodInSecs = 31536000 are set one is under the default stanza and another one is under summary stanza frozenTimePeriodInSecs = 188697600. So which one will be taken in consideration for deleting the data.

Can rm -rf used to delete the data present under the coldb bucket.

thanks in advance.

0 Karma
Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

SplunkTrust
SplunkTrust

If that's the space used by one indexer alone, something else is going on.
Are your indexers replicating buckets as part of an indexer cluster?
What do you see when you go to Settings -> Distributed Management Console -> Indexing -> Indexes and Volumes -> Index Detail: Instance and select the indexer and one of the two indexes? (Assuming recent version of Splunk)
When you ran the search I gave you, what results did you see?
Is the remaining free space shrinking?

Regarding frozen archive policy: Not having an explicit configuration about what to do with data when it freezes means it gets deleted.
Regarding time until frozen: Run $SPLUNK_HOME/bin/splunk cmd btool --debug indexes list net_fw to see what settings are used for that index. Buckets get deleted when its maximum space is filled or when a bucket crosses over the frozen time period, whichever comes first.
Regarding rm -rf, I'd recommend letting Splunk delete buckets. Before you really run out of space, you can always stop splunk and rm oldest buckets manually... but the best course of action is to understand what's going on, and fix anything that's potentially broken to then have Splunk delete buckets by itself.

0 Karma
Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

Motivator

thanks Martin. Yes I have checked all the indexes instances individually and found that this particular indexer instance data's are not being deleted "under /splogs/net_fw/colddb/" after reaching the frozen time stamp.
when compared with rest four indexer instance we found only data's from May 16 to Jun 16.

Are your indexers replicating buckets as part of an indexer cluster?

Ours is not cluster environment, In all the Indexes.conf we did not find this stanza RepFactor more over when we ran the command and found that in index setting RepFactor = 0 and also ran this command to
./splunk cmd btool server list --debug shclustering and it returned only default values from /etc/system/default/server.conf for all the indexers servers.

$SPLUNKHOME/bin/splunk cmd btool --debug indexes list netfw
/opt/splunk/etc/apps/ADMIN-allindexers/local/indexes.conf [netfw]
/opt/splunk/etc/system/default/indexes.conf assureUTF8 = false
/opt/splunk/etc/system/default/indexes.conf blockSignSize = 0
/opt/splunk/etc/system/default/indexes.conf blockSignatureDatabase = blocksignature
/opt/splunk/etc/system/default/indexes.conf bucketRebuildMemoryHint = auto
/opt/splunk/etc/apps/ADMIN-all
indexers/local/indexes.conf coldPath = volume:Cold/netfw/colddb
/opt/splunk/etc/system/default/indexes.conf coldPath.maxDataSizeMB = 0
/opt/splunk/etc/system/default/indexes.conf coldToFrozenDir =
/opt/splunk/etc/system/default/indexes.conf coldToFrozenScript =
/opt/splunk/etc/system/default/indexes.conf compressRawdata = true
/opt/splunk/etc/system/default/indexes.conf defaultDatabase = main
/opt/splunk/etc/system/default/indexes.conf enableOnlineBucketRepair = true
/opt/splunk/etc/system/default/indexes.conf enableRealtimeSearch = true
/opt/splunk/etc/apps/ADMIN-all
indexers/local/indexes.conf frozenTimePeriodInSecs = 31536000
/opt/splunk/etc/apps/ADMIN-allindexers/local/indexes.conf homePath = volume:Hot/netfw/db
/opt/splunk/etc/system/default/indexes.conf homePath.maxDataSizeMB = 0
/opt/splunk/etc/system/default/indexes.conf hotBucketTimeRefreshInterval = 10
/opt/splunk/etc/system/default/indexes.conf indexThreads = auto
/opt/splunk/etc/system/default/indexes.conf maxBloomBackfillBucketAge = 30d
/opt/splunk/etc/system/default/indexes.conf maxBucketSizeCacheEntries = 0
/opt/splunk/etc/system/default/indexes.conf maxConcurrentOptimizes = 6
/opt/splunk/etc/system/default/indexes.conf maxDataSize = auto
/opt/splunk/etc/system/default/indexes.conf maxHotBuckets = 3
/opt/splunk/etc/system/default/indexes.conf maxHotIdleSecs = 0
/opt/splunk/etc/system/default/indexes.conf maxHotSpanSecs = 7776000
/opt/splunk/etc/system/default/indexes.conf maxMemMB = 5
/opt/splunk/etc/system/default/indexes.conf maxMetaEntries = 1000000
/opt/splunk/etc/system/default/indexes.conf maxRunningProcessGroups = 8
/opt/splunk/etc/system/default/indexes.conf maxRunningProcessGroupsLowPriority = 1
/opt/splunk/etc/system/default/indexes.conf maxTimeUnreplicatedNoAcks = 300
/opt/splunk/etc/system/default/indexes.conf maxTimeUnreplicatedWithAcks = 60
/opt/splunk/etc/apps/ADMIN-allindexers/local/indexes.conf maxTotalDataSizeMB = 250000
/opt/splunk/etc/system/default/indexes.conf maxWarmDBCount = 300
/opt/splunk/etc/system/default/indexes.conf memPoolMB = auto
/opt/splunk/etc/system/default/indexes.conf minRawFileSyncSecs = disable
/opt/splunk/etc/system/default/indexes.conf minStreamGroupQueueSize = 2000
/opt/splunk/etc/system/default/indexes.conf partialServiceMetaPeriod = 0
/opt/splunk/etc/system/default/indexes.conf processTrackerServiceInterval = 1
/opt/splunk/etc/system/default/indexes.conf quarantineFutureSecs = 2592000
/opt/splunk/etc/system/default/indexes.conf quarantinePastSecs = 77760000
/opt/splunk/etc/system/default/indexes.conf rawChunkSizeBytes = 131072
/opt/splunk/etc/system/default/indexes.conf repFactor = 0
/opt/splunk/etc/system/default/indexes.conf rotatePeriodInSecs = 60
/opt/splunk/etc/system/default/indexes.conf serviceMetaPeriod = 25
/opt/splunk/etc/system/default/indexes.conf serviceOnlyAsNeeded = true
/opt/splunk/etc/system/default/indexes.conf serviceSubtaskTimingPeriod = 30
/opt/splunk/etc/system/default/indexes.conf streamingTargetTsidxSyncPeriodMsec = 5000
/opt/splunk/etc/system/default/indexes.conf suppressBannerList =
/opt/splunk/etc/system/default/indexes.conf sync = 0
/opt/splunk/etc/system/default/indexes.conf syncMeta = true
/opt/splunk/etc/apps/ADMIN-all
indexers/local/indexes.conf thawedPath = $SPLUNKDB/netfw/thaweddb
/opt/splunk/etc/system/default/indexes.conf throttleCheckPeriod = 15
/opt/splunk/etc/apps/ADMIN-allindexers/local/indexes.conf tstatsHomePath = volume:Hot/netfw/datamodel_summary
/opt/splunk/etc/system/default/indexes.conf warmToColdScript =

What do you see when you go to Settings -> Distributed Management Console -> Indexing -> Indexes and Volumes -> Index Detail: Instance and select the indexer and one of the two indexes?

I had checked in the DMC, was unable to find the option --> Indexing -> Indexes and Volume ->Index Details in one of the search head. We are using 6.2.1 version.

When you ran the search I gave you, what results did you see?
Is the remaining free space shrinking?

index=internal component=bucketmover idx=netfw (interval 60 min)

08-09-2016 11:09:53.114 -0400 INFO BucketMover - idx=netfw Moving bucket='db146686916914668541102713' because maximum number of warm databases exceeded, starting warmtocold: from='/splogs/netfw/db' to='/splogs/netfw/colddb'

There are 46 Indexes.conf file configured in each of the indexer instance and out of this four Indexes.conf containing this parameter? Not sure why they had created four Indexes.conf for same index.

/opt/splunk/etc/apps/ADMIN-all_indexers/default/Indexes.conf

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[default]
frozenTimePeriodInSecs = 31536000

[netfw]
homePath = volume:Hot/net
fw/db
coldPath = volume:Cold/netfw/colddb
tstatsHomePath = volume:Hot/net
fw/datamodelsummary
thawedPath = $SPLUNK
DB/net_fw/thaweddb

2) /opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[default]
frozenTimePeriodInSecs = 31536000

[netfw]
homePath = volume:Hot/net
fw/db
coldPath = volume:Cold/netfw/colddb
tstatsHomePath = volume:Hot/net
fw/datamodelsummary
thawedPath = $SPLUNK
DB/net_fw/thaweddb
maxTotalDataSizeMB = 250000 -- > Included this stanza

/opt/splunk/etc/apps/allindexerbase/local/Indexes.conf

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[netfw]
homePath = volume:Hot/net
fw/db
coldPath = volume:Cold/netfw/colddb
thawedPath = $SPLUNK
DB/net_fw/thaweddb

There is no frozentime period is mentioned in this stanza.

/opt/splunk/etc/apps/allindexerbase/local/indexes.conf.2013.06.03

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[netfw]
homePath = volume:Hot/net
fw/db
coldPath = volume:Cold/netfw/colddb
thawedPath = $SPLUNK
DB/net_fw/thaweddb

No frozentime period is mentioned in this stanza.

I am sure that, going through this much detail will be pain full, Sorry for that. But I wanted to share entire to know what exactly is broken in my environment why splunk is not deleting the buckets.

thanks in advance.

0 Karma
Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

SplunkTrust
SplunkTrust

The configuration as output by btool looks good, no replication going on and the 250GB ceiling was recognized. You should eventually clean up the four different locations all defining indexes.conf, but that's not the issue here - btool merges things correctly.

Regarding DMC - I think the Indexes views were added in 6.3 or 6.4.
As an alternative, you can run | dbinspect index=_internal | search state=cold splunk_server=Martin-PC | stats count sum(sizeOnDiskMB) over all time, might take a moment.
Compare the results with what you see on disk - I'm trying to check if Splunk is still using any of the buckets... ie if starting the freeze didn't happen, or if the freeze itself failed. If you spot buckets on disk that aren't known to Splunk you should be able to rm those fairly safely, and Splunk will probably never clean them on its own.
In both cases, there should be events in _internal complaining about errors; are all BucketMover events just moves from warm to cold? Make sure to not just check 60 minutes, freezing may not happen every day.

0 Karma
Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

Motivator

thanks Martin, I had run the query which you had shared in the comment, but when tried to execute the same with time frame as All Time, it was throwing this error.
"Error in 'dbinspect' command: This command is not supported in a real-time search"

So tried to execute by setting the time frame to two years and got "no result found "

| dbinspect index=internal | search state=cold splunkserver=splunk03 | stats count sum(sizeOnDiskMB)

You should eventually clean up the four different locations all defining Indexes.conf ?
I could not understand this question, so can you tell me what exactly I need to do here. You mean I need to check in the below location

[netfw]
homePath = volume:Hot/net
fw/db
coldPath = volume:Cold/netfw/colddb
tstatsHomePath = volume:Hot/net
fw/datamodelsummary
thawedPath = $SPLUNK
DB/net_fw/thaweddb
maxTotalDataSizeMB = 250000

thanks in advance.

0 Karma
Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

SplunkTrust
SplunkTrust

Regarding the query, I forgot to change the index - you should of course use | dbinspect index=net_fw to match yours. Regarding the time range, use All Time, not All Time (Real-time)... though two years should have the same effect.
If you still see nothing, remove the | search and check if your splunk server's name is correct.
If you still see nothing, have one of your Splunk admins run the query - you might be lacking permissions then.

Regarding cleaning up, it seems you have an old app from 2013 that used to define the indexes, and a new app starting with ADMIN also defining the indexes. Splunk is good at merging these, but having multiple locations just increases the room for human error.

0 Karma
Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

Motivator

Martin, after executing the query with time period for 2 year but I am getting no result found. Even tried to remove the search command but still no luck. Regarding permission I hope I am having the admin privilege.

|dbinspect index = netfw | search state=cold splunkserver=splunk03 | stats count sum(sizeOnDiskMB).

Regarding the old app "/opt/splunk/etc/apps/allindexerbase/local/indexes.conf.2013.06.03" should I need to uncomment the entire stanza.
thanks in advance.

0 Karma
Highlighted

Re: We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

SplunkTrust
SplunkTrust

It seems dbinspect is picky about spaces - make sure you remove the spaces around the equals sign: | dbinspect index=net_fw

0 Karma