Getting Data In

Difference in license usage and diskspace usage

Path Finder

Hi,

I have an index with the following configuration:

[index1]
coldPath = $SPLUNK_DB/index1/colddb
homePath = $SPLUNK_DB/index1/db
thawedPath = $SPLUNK_DB/index1/thaweddb
maxDataSize = auto_high_volume
frozenTimePeriodInSecs = 31536000
maxTotalDataSizeMB = 5000000
repFactor = auto

In the license master, I can see that the cumulative raw data size for index1 is 619GB. However, on the indexers, the size of $SPLUNK_DB/index1/colddb and $SPLUNK_DB/index1/db are 1.8TB and 2.1TB respectively.

Is this right?

Is there any way i can reduce the disk usage with the data retention period unchanged?

Thanks.

Regards,
Jackie

1 Solution

Communicator

OK so it sounds like part 1 on the discrepancy is due to the difference in the _internal index retention which affects the license master logs that will show and the retention on the index shown here. The disk utilization on the indexer represents a much longer time period than what's reflected in _internal.

To address the second part of the question on reducing disk usage without changing the retention period, here's a couple options:

  • Check your cluster master to see if there are excess bucket copies that you can remove to free up some space.
  • If the data is infrequently accessed past a certain age or if slower searches beyond a certain age aren't a concern, look into tsidx reduction.
  • Make sure your replication and search factors aren't too high as this will require additional space for the extra copies.
  • Use something like |dbinspect index=index1 | chart count by guId to make sure you didn't happen to catch much higher than average space usage for that index due to a bucket imbalance. If the bucket counts aren't reasonably close to even, first make sure you don't have an imbalance on incoming data then consider doing a cluster rebalance for just that index (or include others if you notice the problem on other indexes too).

View solution in original post

0 Karma

Communicator

OK so it sounds like part 1 on the discrepancy is due to the difference in the _internal index retention which affects the license master logs that will show and the retention on the index shown here. The disk utilization on the indexer represents a much longer time period than what's reflected in _internal.

To address the second part of the question on reducing disk usage without changing the retention period, here's a couple options:

  • Check your cluster master to see if there are excess bucket copies that you can remove to free up some space.
  • If the data is infrequently accessed past a certain age or if slower searches beyond a certain age aren't a concern, look into tsidx reduction.
  • Make sure your replication and search factors aren't too high as this will require additional space for the extra copies.
  • Use something like |dbinspect index=index1 | chart count by guId to make sure you didn't happen to catch much higher than average space usage for that index due to a bucket imbalance. If the bucket counts aren't reasonably close to even, first make sure you don't have an imbalance on incoming data then consider doing a cluster rebalance for just that index (or include others if you notice the problem on other indexes too).

View solution in original post

0 Karma

Communicator

How long are your license master logs being retained? Your cumulative size reported from the license master is limited by your log retention on the _internal index, but it looks like the index you are asking about holds a full year's worth of data.

0 Karma

Path Finder

Ah ! That explains the difference in the numbers. I didn't change the license master setup so it should be the default 30 days. Thanks a lot !

0 Karma