We have moved to a new 3-Indexer environment with Index Replication from a 1-Indexer environment. We moved all of the buckets from the old environment to the 3 new Indexers, split up evenly in their cold bucket directories.
The new Indexers are configured with local SSD storage for hot/warm, and a DAS for cold storage.
I am having a difficult time understanding how to monitor the hot/warm storage vs cold storage. I want to be able to keep 30 days of data in our indexes on our SSD drives, and move everything else to cold.
For example, here is what I have configured for our netflow index in indexes.conf:
[volume:HotWarm]
path = /logs
maxVolumeDataSizeMB = 700000
[volume:Cold]
path = /daslogs
maxVolumeDataSizeMB = 22500000
[netflow]
maxDataSize = auto_high_volume
repFactor=auto
homePath = volume:HotWarm/netflow/db
homePath.maxDataSizeMB = 100000
coldPath = volume:Cold/netflow/colddb
coldPath.maxDataSizeMB = 300000
thawedPath = /daslogs/netflow/thaweddb
frozenTimePeriodInSecs = 7776000
For the stanza above, am I to configure this for what the local indexer sees? Or the total among the 3 indexers? It seems to behave as if it is configured for the local indexer, meaning once this indexer reaches its maxDataSizeMB limit, it starts moving buckets to cold and freezing buckets.
On each indexer, this is how much space the HotWarm takes up for netflow:
97G
94G
97G
On each indexer, this is how much space the Cold takes up for netflow:
293G
293G
293G
Here is the query I am using to monitor HotWarm:
| dbinspect index=netflow state=hot
| stats sum(sizeOnDiskMB) as HotSize
| appendcols [ | dbinspect index=netflow state=warm | stats sum(sizeOnDiskMB) as WarmSize]
| eval HotWarm = HotSize + WarmSize
| eval HotWarmTotal = HotWarm / 1024
| gauge HotWarmTotal 0 100 200 300
Result is 274.5G
Here is the query I am using to monitor Cold:
| dbinspect index=netflow state=cold
| stats sum(sizeOnDiskMB) as sizeOnDiskMB by state
| eval sizeOnDiskGB = sizeOnDiskMB / 1024
| gauge sizeOnDiskGB 0 300 600 900
Result is 878.6G
So, the queries I am using are adding all 3 indexers up. But the indexes.conf settings are setup to essentials divide by 3.
Is this how this should be done to monitor? When I run these queries, am I just to assume that the load is divided up by 3?
Sorry for the long post. Just trying to explain this thoroughly.
Thanks!
The indexes.conf file applies to an individual indexer or peer node, it has no knowledge of how many other members are in the cluster or how much data might be on each member.
Therefore your indexes.conf needs to be designed per indexer.
For example:
[volume:HotWarm]
path = /logs
maxVolumeDataSizeMB = 700000
That is 683GB per indexer (700000/1024), not 683GB of hot data for the entire cluster.
You could narrow down your dbinspect to a single indexer, dividing by the number of indexers should roughly work however that is assuming the data is evenly balanced between your indexer cluster members.
Newer splunk versions allow data rebalancing to assist with this.
I use the query:
| tstats count WHERE index="*" by splunk_server _time span=10m | timechart span=10m sum(count) by splunk_server
I then visualize in an area graph with 100% stacked mode to see if the data is even among cluster members or not.
If it's not even then you might need to do some tweaking and also run a data rebalance.
I would love to see an answer to this question.