Getting Data In

Difference in license usage and diskspace usage

jackiewkc
Path Finder

Hi,

I have an index with the following configuration:

[index1]
coldPath = $SPLUNK_DB/index1/colddb
homePath = $SPLUNK_DB/index1/db
thawedPath = $SPLUNK_DB/index1/thaweddb
maxDataSize = auto_high_volume
frozenTimePeriodInSecs = 31536000
maxTotalDataSizeMB = 5000000
repFactor = auto

In the license master, I can see that the cumulative raw data size for index1 is 619GB. However, on the indexers, the size of $SPLUNK_DB/index1/colddb and $SPLUNK_DB/index1/db are 1.8TB and 2.1TB respectively.

Is this right?

Is there any way i can reduce the disk usage with the data retention period unchanged?

Thanks.

Regards,
Jackie

1 Solution

traxxasbreaker
Communicator

OK so it sounds like part 1 on the discrepancy is due to the difference in the _internal index retention which affects the license master logs that will show and the retention on the index shown here. The disk utilization on the indexer represents a much longer time period than what's reflected in _internal.

To address the second part of the question on reducing disk usage without changing the retention period, here's a couple options:

  • Check your cluster master to see if there are excess bucket copies that you can remove to free up some space.
  • If the data is infrequently accessed past a certain age or if slower searches beyond a certain age aren't a concern, look into tsidx reduction.
  • Make sure your replication and search factors aren't too high as this will require additional space for the extra copies.
  • Use something like |dbinspect index=index1 | chart count by guId to make sure you didn't happen to catch much higher than average space usage for that index due to a bucket imbalance. If the bucket counts aren't reasonably close to even, first make sure you don't have an imbalance on incoming data then consider doing a cluster rebalance for just that index (or include others if you notice the problem on other indexes too).

View solution in original post

0 Karma

traxxasbreaker
Communicator

OK so it sounds like part 1 on the discrepancy is due to the difference in the _internal index retention which affects the license master logs that will show and the retention on the index shown here. The disk utilization on the indexer represents a much longer time period than what's reflected in _internal.

To address the second part of the question on reducing disk usage without changing the retention period, here's a couple options:

  • Check your cluster master to see if there are excess bucket copies that you can remove to free up some space.
  • If the data is infrequently accessed past a certain age or if slower searches beyond a certain age aren't a concern, look into tsidx reduction.
  • Make sure your replication and search factors aren't too high as this will require additional space for the extra copies.
  • Use something like |dbinspect index=index1 | chart count by guId to make sure you didn't happen to catch much higher than average space usage for that index due to a bucket imbalance. If the bucket counts aren't reasonably close to even, first make sure you don't have an imbalance on incoming data then consider doing a cluster rebalance for just that index (or include others if you notice the problem on other indexes too).
0 Karma

traxxasbreaker
Communicator

How long are your license master logs being retained? Your cumulative size reported from the license master is limited by your log retention on the _internal index, but it looks like the index you are asking about holds a full year's worth of data.

0 Karma

jackiewkc
Path Finder

Ah ! That explains the difference in the numbers. I didn't change the license master setup so it should be the default 30 days. Thanks a lot !

0 Karma
Get Updates on the Splunk Community!

What's New in Splunk Enterprise 9.4: Features to Power Your Digital Resilience

Hey Splunky People! We are excited to share the latest updates in Splunk Enterprise 9.4. In this release we ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...