Deployment Architecture

Why are the indexers disk usage more than the smartstore volume?

vtalanki
Path Finder

Hi,

We have close to 1000 indexers in our splunk cluster on AWS. Each indexer has 15TB SSD local storage. Our retention is 30 days and we enable smartstore with AWS S3.

The total s3 bucket size for our cluster says it is around 9 PB, however the disk usage on almost all of our indexers is around 95% which leads to (1000 * 0.95 * 15 TB) = 14.2 PB. 

What is taking up additional ~5 PB of disk space on indexers? I'm sure the hot data(which isn't on s3) is definitely not of 2.5 PB (RF =2) size.

Can someone please throw some light here?

Thanks.

0 Karma

jdunlea
Contributor

Yeah, I guess what I am getting at here is that this search is still just telling us what the OS knows in terms of how much storage is used. But that is not a guarantee that the storage is necessarily being consumed entirely by indexed data. I mean, it is quite possibly being consumed entirely be indexed data. 

 

What does your CMC tell you about the size of your indexed data on disk (the size of the data in the indexes themselves)?

0 Karma

jdunlea
Contributor

When you say that the "disk usage" is 95%, presumably this is total disk usage on the indexers as reported by the OS.

 

What does the monitoring console tell you about the actual index sizes on disk? 

In other words, what is Splunk saying its storage usage is for the indexes (buckets)? 

 

 

0 Karma

vtalanki
Path Finder

@jdunlea We got these disk usage numbers from splunk MC itself. 

Infact I also ran this query which outputs 900+ indexers with 96% usage

| rest splunk_server_group=dmc_group_indexer /services/server/status/partitions-space
| eval free = if(isnotnull(available), available, free)
| eval usage = capacity - free
| eval pct_usage = floor(usage / capacity * 100)
| where pct_usage > 80
| stats first(fs_type) as fs_type first(capacity) AS capacity first(usage) AS usage first(pct_usage) AS pct_usage by splunk_server, mount_point
| eval usage = round(usage / 1024, 2)
| eval capacity = round(capacity / 1024, 2)
| rename splunk_server AS Instance mount_point as "Mount Point", fs_type as "File System Type", usage as "Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Usage (%)"

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...