Deployment Architecture

Why are the indexers disk usage more than the smartstore volume?

vtalanki
Path Finder

Hi,

We have close to 1000 indexers in our splunk cluster on AWS. Each indexer has 15TB SSD local storage. Our retention is 30 days and we enable smartstore with AWS S3.

The total s3 bucket size for our cluster says it is around 9 PB, however the disk usage on almost all of our indexers is around 95% which leads to (1000 * 0.95 * 15 TB) = 14.2 PB. 

What is taking up additional ~5 PB of disk space on indexers? I'm sure the hot data(which isn't on s3) is definitely not of 2.5 PB (RF =2) size.

Can someone please throw some light here?

Thanks.

0 Karma

jdunlea
Contributor

Yeah, I guess what I am getting at here is that this search is still just telling us what the OS knows in terms of how much storage is used. But that is not a guarantee that the storage is necessarily being consumed entirely by indexed data. I mean, it is quite possibly being consumed entirely be indexed data. 

 

What does your CMC tell you about the size of your indexed data on disk (the size of the data in the indexes themselves)?

0 Karma

jdunlea
Contributor

When you say that the "disk usage" is 95%, presumably this is total disk usage on the indexers as reported by the OS.

 

What does the monitoring console tell you about the actual index sizes on disk? 

In other words, what is Splunk saying its storage usage is for the indexes (buckets)? 

 

 

0 Karma

vtalanki
Path Finder

@jdunlea We got these disk usage numbers from splunk MC itself. 

Infact I also ran this query which outputs 900+ indexers with 96% usage

| rest splunk_server_group=dmc_group_indexer /services/server/status/partitions-space
| eval free = if(isnotnull(available), available, free)
| eval usage = capacity - free
| eval pct_usage = floor(usage / capacity * 100)
| where pct_usage > 80
| stats first(fs_type) as fs_type first(capacity) AS capacity first(usage) AS usage first(pct_usage) AS pct_usage by splunk_server, mount_point
| eval usage = round(usage / 1024, 2)
| eval capacity = round(capacity / 1024, 2)
| rename splunk_server AS Instance mount_point as "Mount Point", fs_type as "File System Type", usage as "Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Usage (%)"

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...