Splunk Search

Why is Index bucket rolling faster than what is set in indexes.conf?

power12
Communicator

Hello Splunkers
I am pretty new to splunk admin .I have the following config set up in indexes.conf where I set up one day for hot buckets  

 

 

[default]
maxHotSpanSecs = 86400
[splunklogger]
archiver.enableDataArchive = 0
bucketRebuildMemoryHint = 0
compressRawdata = 1
enableDataIntegrityControl = 1
enableOnlineBucketRepair = 1
enableTsidxReduction = 0
metric.enableFloatingPointCompression = 1
minHotIdleSecsBeforeForceRoll = 0
rtRouterQueueSize =
rtRouterThreads =
selfStorageThreads =
suspendHotRollByDeleteQuery = 0
syncMeta = 1
tsidxWritingLevel =

 

 

But I'm not sure why it is chunking the data this way, according to the timestamp, this one is about every 4.5-5 hours.What changes should I do to the indexes.conf

 

root@login-prom4:/raid/splunk-var/lib/splunk/abc/db# du -sh ./*
4.0K    ./CreationTime
756M    ./db_1675137103_1675119933_1
756M    ./db_1675154294_1675137102_2
849M    ./db_1675171544_1675154293_3
750M    ./hot_v1_0
617M    ./hot_v1_4

 

 

Thanks in Advance 

Labels (1)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

1st you should use 

splunk btool indexes list abc [--debug]

command to see what are used parameters for index abc (as you seem named your index). You could add --debug to this command to get information where/in which file those are defined. You must run this on your indexer host. If you are using volumes where you have stored your indexes then look also it's parameters.

2nd there are several parameters and other events which define when hot bucket was closed and rolled to warm not only this maxHotSpanSecs! At least the next situations will rise hot to warm rolling

  • When splunk (re)starts. Bucket will rolled from hot to warm when splunk starts (not when it's go down!)
  • When hot bucket comes full. The size of bucket is defined by maxDataSize parameter (normally 750MB to 10GB based on it's value)
  • When maxHostSpanSecs has reached. 
  • When maxHotIdleSecs has reached. If it haven't received any data within this amount of seconds.
  • When maxHotBuckets has reached. There can be only this number of open hot buckets per index (actually ingestion pipeline)
  • When it receive event which timestamp was much older than other events in bucket

Maybe some other reason too, but I think that those are the most common reasons from rolling hot to warm. Also check volume's parameters if you have stored your hot&warm into volumes instead of directly in filesystem.

r. Ismo

0 Karma

power12
Communicator

@isoutamo  Thank you so much for your reply..I ran the b tool command and I have the below output.Can you please suggest any changes to it.

./splunk btool indexes list prometheus [--debug]
[prometheus]
archiver.enableDataArchive = false
archiver.maxDataArchiveRetentionPeriod = 0
assureUTF8 = false
bucketMerge.maxMergeSizeMB = 1000
bucketMerge.maxMergeTimeSpanSecs = 7776000
bucketMerge.minMergeSizeMB = 750
bucketMerging = false
bucketRebuildMemoryHint = auto
coldPath = $SPLUNK_DB/prometheus/colddb
coldPath.maxDataSizeMB = 0
coldToFrozenDir =
coldToFrozenScript =
compressRawdata = true
datatype = event
defaultDatabase = main
enableDataIntegrityControl = 0
enableOnlineBucketRepair = true
enableRealtimeSearch = true
enableTsidxReduction = 0
federated.dataset =
federated.provider =
fileSystemExecutorWorkers = 5
frozenTimePeriodInSecs = 188697600
homePath = $SPLUNK_DB/prometheus/db
homePath.maxDataSizeMB = 0
hotBucketStreaming.deleteHotsAfterRestart = false
hotBucketStreaming.extraBucketBuildingCmdlineArgs =
hotBucketStreaming.removeRemoteSlicesOnRoll = false
hotBucketStreaming.reportStatus = false
hotBucketStreaming.sendSlices = false
hotBucketTimeRefreshInterval = 10
indexThreads = auto
journalCompression = gzip
maxBloomBackfillBucketAge = 30d
maxBucketSizeCacheEntries = 0
maxConcurrentOptimizes = 6
maxDataSize = auto
maxGlobalDataSizeMB = 0
maxGlobalRawDataSizeMB = 0
maxHotBuckets = auto
maxHotIdleSecs = 0
maxHotSpanSecs = 86400
maxMemMB = 5
maxMetaEntries = 1000000
maxRunningProcessGroups = 8
maxRunningProcessGroupsLowPriority = 1
maxTimeUnreplicatedNoAcks = 300
maxTimeUnreplicatedWithAcks = 60
maxTotalDataSizeMB = 512000
maxWarmDBCount = 300
memPoolMB = auto
metric.compressionBlockSize = 1024
metric.enableFloatingPointCompression = true
metric.maxHotBuckets = auto
metric.splitByIndexKeys =
metric.stubOutRawdataJournal = true
metric.timestampResolution = s
metric.tsidxTargetSizeMB = 1500
minHotIdleSecsBeforeForceRoll = auto
minRawFileSyncSecs = disable
minStreamGroupQueueSize = 2000
partialServiceMetaPeriod = 0
processTrackerServiceInterval = 1
quarantineFutureSecs = 2592000
quarantinePastSecs = 77760000
rawChunkSizeBytes = 131072
repFactor = 0
rotatePeriodInSecs = 60
rtRouterQueueSize = 10000
rtRouterThreads = 0
selfStorageThreads = 2
serviceInactiveIndexesPeriod = 60
serviceMetaPeriod = 25
serviceOnlyAsNeeded = true
serviceSubtaskTimingPeriod = 30
splitByIndexKeys =
streamingTargetTsidxSyncPeriodMsec = 5000
suppressBannerList =
suspendHotRollByDeleteQuery = false
sync = 0
syncMeta = true
thawedPath = $SPLUNK_DB/prometheus/thaweddb
throttleCheckPeriod = 15
timePeriodInSecBeforeTsidxReduction = 604800
tsidxDedupPostingsListMaxTermsLimit = 8388608
tsidxReductionCheckPeriodInSec = 600
tsidxTargetSizeMB = 1500
tsidxWritingLevel = 2
tstatsHomePath = volume:_splunk_summaries/$_index_name/datamodel_summary
waitPeriodInSecsForManifestWrite = 60
warmToColdScript =
0 Karma

isoutamo
SplunkTrust
SplunkTrust

What is the real issue which you are trying to solve? Couple of buckets per day in one index is not an issue.

Based on your 1st post you have those ~750MB buckets as should. You seems to have individual indexer so probably there are not so huge amount of daily ingestion that you should move to bigger bucket size (like 10GB).

If you have recent splunk version, I will change the next parameters

  • tsidxWritingLevel = 4 (or what is the biggest accepted values in your version) this is global option
  • journalCompression = zstd (or lz4 if zstd is not supported) this is per index option
0 Karma

power12
Communicator

@isoutamo  I want to configure setting that  would make the buckets 86400 secs which is one day.Yes our splunk environment is single instance. I am trying to create buckets with 1 day data in them 

Can you please let me know Which stanza shows 750MB buckets in my indexes.conf

Thanks in Advance

0 Karma

isoutamo
SplunkTrust
SplunkTrust

That is maxDataSize. Default is auto which leads size to max 750MB. https://docs.splunk.com/Documentation/Splunk/latest/Admin/Indexesconf
Still I don’t understand your issue with buckets! Why you want that one bucket contains all events from one day or actually from 24h. What is the issue which you are trying to solve with this conf change? 

Actually if/when you are taking SmartStore into use those bigger buckets are real issue and you must use max 750MB buckets. 

0 Karma

power12
Communicator

@isoutamo .Iwrote a python script that explores the splunk-var indexes and calculates their total size, and then asks the user if they’d like to back it up.

After the user indicates which indexes they’d like to back up, it copies all buckets and other metadata in the db path (excluding the hot bucket) to a dir that is specified as a command line arg.

My ask is

How to actually back up files (is it as simple as copying out the dir and then later copying it in and restarting splunk)

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Please create a separate question about splunk backup. That way other can found it easier..
0 Karma

shivanshu1593
Builder

hello,

You are look at /var/lib/splunk/db directories for the indexes, where the hot and warm data is stored (Cold too unless you have made a separate directory/mount for it in indexes.conf). As the data rolls in and gets stored in the indexers, the modification timestamp of this directory will obviously change but it doesn't indicate that your data got rolled from hot to warm to cold, unless you are looking at a huge chunk of data coming in (100+ GB every hour) or your index size is way too small. Are you able to search the data that you need? If so, then you don't need to worry about this unless you have specific requirements about storage for particular days.

++If it helps, please consider accepting as an answer++

Thank you,
Shiv
###If you found the answer helpful, kindly consider upvoting/accepting it as the answer as it helps other Splunkers find the solutions to similar issues###
0 Karma
Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...