Getting Data In

We have a shortage of disk space in one indexer. Can we delete data present in the colddb directory?

Hemnaath
Motivator

We are currently running out of space in one Splunk indexer out of 5 indexers in our distributed environment. Using Splunk 6.2.1 Version.
Total size of the indexer volume is about 5.2TB. Currently we are left out with less then 100 GB of space and everyday an average of 10GB of space is occupied. The data that is occupying space is almost 3.5 year old data. and most of the data is present under the colddb storage unit under the mount point /splogs.

Disk Usage status

df -h /splogs
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_splunk03_san-splunk_logs
                      5.6T  5.3T   93G  99% /splogs

We could find most of the space is occupied by these indexes.

[net_proxy], [net_fw], [unix_svrs] & [unix_bsm] 

Example:

[root@splunk03 splogs]# cd unix_svrs
[root@splunk03 unix_svrs]# ls -ltr
total 416
drwx------    2 splunk splunk   4096 Apr 19  2012 thaweddb
drwx------ 1590 splunk splunk 102400 Aug  6 09:18 colddb
drwx------ 1890 splunk splunk 131072 Aug  6 12:51 summary
drwx------ 1893 splunk splunk 143360 Aug  6 12:53 datamodel_summary
drwx------  307 splunk splunk  28672 Aug  6 12:54 db
[root@splunk03 unix_svrs]# du -sh *
1007G   colddb
1.6G    datamodel_summary
229G    db
366M    summary
4.0K    thaweddb

[root@splunk03 splogs]# cd net_fw
[root@splunk03 net_fw]# ls -ltr
total 612
drwx------    2 splunk splunk   4096 Apr 19  2012 thaweddb
drwx------ 1358 splunk splunk 131072 Sep 27  2015 summary
drwx------ 2956 splunk splunk 180224 Aug  6 12:17 colddb
drwx------ 3258 splunk splunk 266240 Aug  6 12:55 datamodel_summary
drwx------  313 splunk splunk  28672 Aug  6 12:55 db
[root@splunk03 net_fw]# du -sh *
**1.3T**    colddb
76G     datamodel_summary
147G    db
24M     summary
4.0K    thaweddb

Indexes.conf details for these indexers

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[default]
frozenTimePeriodInSecs = 31536000

[net_fw]
homePath = volume:Hot/net_fw/db
coldPath = volume:Cold/net_fw/colddb
tstatsHomePath = volume:Hot/net_fw/datamodel_summary
thawedPath = $SPLUNK_DB/net_fw/thaweddb
maxTotalDataSizeMB = 250000

[unix_svrs]
homePath = volume:Hot/unix_svrs/db
coldPath = volume:Cold/unix_svrs/colddb
tstatsHomePath = volume:Hot/unix_svrs/datamodel_summary
thawedPath = $SPLUNK_DB/unix_svrs/thaweddb
maxTotalDataSizeMB = 250000

[summary]
frozenTimePeriodInSecs = 188697600

There are other indexers configured in the same manner as shown above in Indexes.conf.

Kindly let me know whether we can delete the data that are present under the colddb directory for the indexer occupying more than 1TB. By doing this, what will be the impact? Or is there any other method we can prevent the failure of the splunk service due to low disk space?

0 Karma

Hemnaath
Motivator

thanks Martin, I had run the query which you had shared in the comment, but when tried to execute the same with time frame as All Time, it was throwing this error.
"Error in 'dbinspect' command: This command is not supported in a real-time search"

So tried to execute by setting the time frame to two years and got "no result found "

| dbinspect index=_internal | search state=cold splunk_server=splunk03 | stats count sum(sizeOnDiskMB)

You should eventually clean up the four different locations all defining Indexes.conf ?
I could not understand this question, so can you tell me what exactly I need to do here. You mean I need to check in the below location

[net_fw]
homePath = volume:Hot/net_fw/db
coldPath = volume:Cold/net_fw/colddb
tstatsHomePath = volume:Hot/net_fw/datamodel_summary
thawedPath = $SPLUNK_DB/net_fw/thaweddb
maxTotalDataSizeMB = 250000

thanks in advance.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

If that's the space used by one indexer alone, something else is going on.
Are your indexers replicating buckets as part of an indexer cluster?
What do you see when you go to Settings -> Distributed Management Console -> Indexing -> Indexes and Volumes -> Index Detail: Instance and select the indexer and one of the two indexes? (Assuming recent version of Splunk)
When you ran the search I gave you, what results did you see?
Is the remaining free space shrinking?

Regarding frozen archive policy: Not having an explicit configuration about what to do with data when it freezes means it gets deleted.
Regarding time until frozen: Run $SPLUNK_HOME/bin/splunk cmd btool --debug indexes list net_fw to see what settings are used for that index. Buckets get deleted when its maximum space is filled or when a bucket crosses over the frozen time period, whichever comes first.
Regarding rm -rf, I'd recommend letting Splunk delete buckets. Before you really run out of space, you can always stop splunk and rm oldest buckets manually... but the best course of action is to understand what's going on, and fix anything that's potentially broken to then have Splunk delete buckets by itself.

0 Karma

Hemnaath
Motivator

thanks Martin. Yes I have checked all the indexes instances individually and found that this particular indexer instance data's are not being deleted "under /splogs/net_fw/colddb/" after reaching the frozen time stamp.
when compared with rest four indexer instance we found only data's from May 16 to Jun 16.

Are your indexers replicating buckets as part of an indexer cluster?

Ours is not cluster environment, In all the Indexes.conf we did not find this stanza RepFactor more over when we ran the command and found that in index setting RepFactor = 0 and also ran this command to
./splunk cmd btool server list --debug shclustering and it returned only default values from /etc/system/default/server.conf for all the indexers servers.

$SPLUNK_HOME/bin/splunk cmd btool --debug indexes list net_fw
/opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf [net_fw]
/opt/splunk/etc/system/default/indexes.conf assureUTF8 = false
/opt/splunk/etc/system/default/indexes.conf blockSignSize = 0
/opt/splunk/etc/system/default/indexes.conf blockSignatureDatabase = _blocksignature
/opt/splunk/etc/system/default/indexes.conf bucketRebuildMemoryHint = auto
/opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf coldPath = volume:Cold/net_fw/colddb
/opt/splunk/etc/system/default/indexes.conf coldPath.maxDataSizeMB = 0
/opt/splunk/etc/system/default/indexes.conf coldToFrozenDir =
/opt/splunk/etc/system/default/indexes.conf coldToFrozenScript =
/opt/splunk/etc/system/default/indexes.conf compressRawdata = true
/opt/splunk/etc/system/default/indexes.conf defaultDatabase = main
/opt/splunk/etc/system/default/indexes.conf enableOnlineBucketRepair = true
/opt/splunk/etc/system/default/indexes.conf enableRealtimeSearch = true
/opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf frozenTimePeriodInSecs = 31536000
/opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf homePath = volume:Hot/net_fw/db
/opt/splunk/etc/system/default/indexes.conf homePath.maxDataSizeMB = 0
/opt/splunk/etc/system/default/indexes.conf hotBucketTimeRefreshInterval = 10
/opt/splunk/etc/system/default/indexes.conf indexThreads = auto
/opt/splunk/etc/system/default/indexes.conf maxBloomBackfillBucketAge = 30d
/opt/splunk/etc/system/default/indexes.conf maxBucketSizeCacheEntries = 0
/opt/splunk/etc/system/default/indexes.conf maxConcurrentOptimizes = 6
/opt/splunk/etc/system/default/indexes.conf maxDataSize = auto
/opt/splunk/etc/system/default/indexes.conf maxHotBuckets = 3
/opt/splunk/etc/system/default/indexes.conf maxHotIdleSecs = 0
/opt/splunk/etc/system/default/indexes.conf maxHotSpanSecs = 7776000
/opt/splunk/etc/system/default/indexes.conf maxMemMB = 5
/opt/splunk/etc/system/default/indexes.conf maxMetaEntries = 1000000
/opt/splunk/etc/system/default/indexes.conf maxRunningProcessGroups = 8
/opt/splunk/etc/system/default/indexes.conf maxRunningProcessGroupsLowPriority = 1
/opt/splunk/etc/system/default/indexes.conf maxTimeUnreplicatedNoAcks = 300
/opt/splunk/etc/system/default/indexes.conf maxTimeUnreplicatedWithAcks = 60
/opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf maxTotalDataSizeMB = 250000
/opt/splunk/etc/system/default/indexes.conf maxWarmDBCount = 300
/opt/splunk/etc/system/default/indexes.conf memPoolMB = auto
/opt/splunk/etc/system/default/indexes.conf minRawFileSyncSecs = disable
/opt/splunk/etc/system/default/indexes.conf minStreamGroupQueueSize = 2000
/opt/splunk/etc/system/default/indexes.conf partialServiceMetaPeriod = 0
/opt/splunk/etc/system/default/indexes.conf processTrackerServiceInterval = 1
/opt/splunk/etc/system/default/indexes.conf quarantineFutureSecs = 2592000
/opt/splunk/etc/system/default/indexes.conf quarantinePastSecs = 77760000
/opt/splunk/etc/system/default/indexes.conf rawChunkSizeBytes = 131072
/opt/splunk/etc/system/default/indexes.conf repFactor = 0
/opt/splunk/etc/system/default/indexes.conf rotatePeriodInSecs = 60
/opt/splunk/etc/system/default/indexes.conf serviceMetaPeriod = 25
/opt/splunk/etc/system/default/indexes.conf serviceOnlyAsNeeded = true
/opt/splunk/etc/system/default/indexes.conf serviceSubtaskTimingPeriod = 30
/opt/splunk/etc/system/default/indexes.conf streamingTargetTsidxSyncPeriodMsec = 5000
/opt/splunk/etc/system/default/indexes.conf suppressBannerList =
/opt/splunk/etc/system/default/indexes.conf sync = 0
/opt/splunk/etc/system/default/indexes.conf syncMeta = true
/opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf thawedPath = $SPLUNK_DB/net_fw/thaweddb
/opt/splunk/etc/system/default/indexes.conf throttleCheckPeriod = 15
/opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf tstatsHomePath = volume:Hot/net_fw/datamodel_summary
/opt/splunk/etc/system/default/indexes.conf warmToColdScript =

What do you see when you go to Settings -> Distributed Management Console -> Indexing -> Indexes and Volumes -> Index Detail: Instance and select the indexer and one of the two indexes?

I had checked in the DMC, was unable to find the option --> Indexing -> Indexes and Volume ->Index Details in one of the search head. We are using 6.2.1 version.

When you ran the search I gave you, what results did you see?
Is the remaining free space shrinking?

index=_internal component=bucketmover idx=net_fw (interval 60 min)

08-09-2016 11:09:53.114 -0400 INFO BucketMover - idx=net_fw Moving bucket='db_1466869169_1466854110_2713' because maximum number of warm databases exceeded, starting warm_to_cold: from='/splogs/net_fw/db' to='/splogs/net_fw/colddb'

There are 46 Indexes.conf file configured in each of the indexer instance and out of this four Indexes.conf containing this parameter? Not sure why they had created four Indexes.conf for same index.

/opt/splunk/etc/apps/ADMIN-all_indexers/default/Indexes.conf

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[default]
frozenTimePeriodInSecs = 31536000

[net_fw]
homePath = volume:Hot/net_fw/db
coldPath = volume:Cold/net_fw/colddb
tstatsHomePath = volume:Hot/net_fw/datamodel_summary
thawedPath = $SPLUNK_DB/net_fw/thaweddb

2) /opt/splunk/etc/apps/ADMIN-all_indexers/local/indexes.conf

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[default]
frozenTimePeriodInSecs = 31536000

[net_fw]
homePath = volume:Hot/net_fw/db
coldPath = volume:Cold/net_fw/colddb
tstatsHomePath = volume:Hot/net_fw/datamodel_summary
thawedPath = $SPLUNK_DB/net_fw/thaweddb
maxTotalDataSizeMB = 250000 -- > Included this stanza

/opt/splunk/etc/apps/all_indexer_base/local/Indexes.conf

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[net_fw]
homePath = volume:Hot/net_fw/db
coldPath = volume:Cold/net_fw/colddb
thawedPath = $SPLUNK_DB/net_fw/thaweddb

There is no frozentime period is mentioned in this stanza.

/opt/splunk/etc/apps/all_indexer_base/local/indexes.conf.2013.06.03

[volume:Hot]
path = /splogs

[volume:Cold]
path = /splogs

[volume:Base]
path = /splogs

[net_fw]
homePath = volume:Hot/net_fw/db
coldPath = volume:Cold/net_fw/colddb
thawedPath = $SPLUNK_DB/net_fw/thaweddb

No frozentime period is mentioned in this stanza.

I am sure that, going through this much detail will be pain full, Sorry for that. But I wanted to share entire to know what exactly is broken in my environment why splunk is not deleting the buckets.

thanks in advance.

0 Karma

Hemnaath
Motivator

thank Martin for throwing some lights on this issue.

We have 5 individual physical server for splunk indexer instance with space of 5.6TB configured in each indexer instance, which in turn pointed to separate mount point /splogs in each indexer instances. Now the problem is only with one of indexer instance which is running out of space.

I had executed above query as mentioned in comment, but not sure how to check whether the old data are being removed or not?

maxTotalDataSizeMB 250 GB is mentioned only for few indexes in the Indexes.conf file, but how to check whether data are being deleted after reaching the 250GB mark? I mean how and where to check the frozen bucket and there is no archive policy being set in the Indexes.conf file to archive the data.

There are two frozenTimePeriodInSecs = 31536000 are set one is under the default stanza and another one is under summary stanza frozenTimePeriodInSecs = 188697600. So which one will be taken in consideration for deleting the data.

Can rm -rf used to delete the data present under the coldb bucket.

thanks in advance.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...