Deployment Architecture

Why are the indexers disk usage more than the smartstore volume?

vtalanki
Path Finder

Hi,

We have close to 1000 indexers in our splunk cluster on AWS. Each indexer has 15TB SSD local storage. Our retention is 30 days and we enable smartstore with AWS S3.

The total s3 bucket size for our cluster says it is around 9 PB, however the disk usage on almost all of our indexers is around 95% which leads to (1000 * 0.95 * 15 TB) = 14.2 PB. 

What is taking up additional ~5 PB of disk space on indexers? I'm sure the hot data(which isn't on s3) is definitely not of 2.5 PB (RF =2) size.

Can someone please throw some light here?

Thanks.

0 Karma

jdunlea
Contributor

Yeah, I guess what I am getting at here is that this search is still just telling us what the OS knows in terms of how much storage is used. But that is not a guarantee that the storage is necessarily being consumed entirely by indexed data. I mean, it is quite possibly being consumed entirely be indexed data. 

 

What does your CMC tell you about the size of your indexed data on disk (the size of the data in the indexes themselves)?

0 Karma

jdunlea
Contributor

When you say that the "disk usage" is 95%, presumably this is total disk usage on the indexers as reported by the OS.

 

What does the monitoring console tell you about the actual index sizes on disk? 

In other words, what is Splunk saying its storage usage is for the indexes (buckets)? 

 

 

0 Karma

vtalanki
Path Finder

@jdunlea We got these disk usage numbers from splunk MC itself. 

Infact I also ran this query which outputs 900+ indexers with 96% usage

| rest splunk_server_group=dmc_group_indexer /services/server/status/partitions-space
| eval free = if(isnotnull(available), available, free)
| eval usage = capacity - free
| eval pct_usage = floor(usage / capacity * 100)
| where pct_usage > 80
| stats first(fs_type) as fs_type first(capacity) AS capacity first(usage) AS usage first(pct_usage) AS pct_usage by splunk_server, mount_point
| eval usage = round(usage / 1024, 2)
| eval capacity = round(capacity / 1024, 2)
| rename splunk_server AS Instance mount_point as "Mount Point", fs_type as "File System Type", usage as "Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Usage (%)"

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...