Deployment Architecture

Why are the indexers disk usage more than the smartstore volume?

vtalanki
Path Finder

Hi,

We have close to 1000 indexers in our splunk cluster on AWS. Each indexer has 15TB SSD local storage. Our retention is 30 days and we enable smartstore with AWS S3.

The total s3 bucket size for our cluster says it is around 9 PB, however the disk usage on almost all of our indexers is around 95% which leads to (1000 * 0.95 * 15 TB) = 14.2 PB. 

What is taking up additional ~5 PB of disk space on indexers? I'm sure the hot data(which isn't on s3) is definitely not of 2.5 PB (RF =2) size.

Can someone please throw some light here?

Thanks.

0 Karma

jdunlea
Contributor

Yeah, I guess what I am getting at here is that this search is still just telling us what the OS knows in terms of how much storage is used. But that is not a guarantee that the storage is necessarily being consumed entirely by indexed data. I mean, it is quite possibly being consumed entirely be indexed data. 

 

What does your CMC tell you about the size of your indexed data on disk (the size of the data in the indexes themselves)?

0 Karma

jdunlea
Contributor

When you say that the "disk usage" is 95%, presumably this is total disk usage on the indexers as reported by the OS.

 

What does the monitoring console tell you about the actual index sizes on disk? 

In other words, what is Splunk saying its storage usage is for the indexes (buckets)? 

 

 

0 Karma

vtalanki
Path Finder

@jdunlea We got these disk usage numbers from splunk MC itself. 

Infact I also ran this query which outputs 900+ indexers with 96% usage

| rest splunk_server_group=dmc_group_indexer /services/server/status/partitions-space
| eval free = if(isnotnull(available), available, free)
| eval usage = capacity - free
| eval pct_usage = floor(usage / capacity * 100)
| where pct_usage > 80
| stats first(fs_type) as fs_type first(capacity) AS capacity first(usage) AS usage first(pct_usage) AS pct_usage by splunk_server, mount_point
| eval usage = round(usage / 1024, 2)
| eval capacity = round(capacity / 1024, 2)
| rename splunk_server AS Instance mount_point as "Mount Point", fs_type as "File System Type", usage as "Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Usage (%)"

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...