Reporting

How to report Detailed Disk usage in a distributed environment?

bkumarm
Contributor

We have a setup with 5 search heads, 20 Indexers, 5 Universal forwarders, 5 Heavy forwarders, 2 License Masters, 3 cluster masters and one deployment server.
we have about 30 indexes which are in multiple indexers. Each index has data coming from multiple sourcetypes.
The requirement we have is to generate a Disk usage report which shows:
1. Disk usage base on sourcetype by Index
2. Index by host and sourcetype
3. Overall Disk usage
4. Allocated Disk Vs consumed Disk and % consumption

we explored the following options, but could not reach into a final report yet. Looking for Help !!!

Requirement:
Report Disk usage statistics based on Index relative to SourceTpe

Evaluation/Implementation:
The following Options were explored for Storage/Disk volume reporting:

  1. Using search queries on metrics.log (Ref1) Example: index=_internal metrics kb group=per_sourcetype_thruput | eval sizeMB = round(kb/1024,2)| stats sum(sizeMB) by series | sort -sum(sizeMB) | rename sum(sizeMB) AS "Size on Disk (MB)"

Metrics log provides us with storage information for Sourcetype and it can be consolidated to provide an tablular view of the statistics. However this data is NOT relative to Indexes.
This data can be presented relative to hosts(Indexer).

  1. dbinspect command (ref4) Example: | dbinspect index=* | stats sum(sizeOnDiskMB) as TotalSizeOnDiskMB, sum(rawSize) as TotalRawSize by index, splunk_server

Dbinspect does not provide disk usage information relative to sourcetype. We can get representation of Index Vs Splunk server(Indexer).

  1. license_usage.log
    This log provides the data volume that has been consumed by Splunk servers and that has been indexed. The disk consumption may vary depdending on the compression factor and retention period.
    we can not assume that the volume shown in the license_usage.log is same as Disk used.

  2. introspect (Ref5)
    This REST API provides us with data volume consumed by Indexes and not based on Sourcetypes.

  3. Rest API : services/data/indexes
    Example:
    | rest /services/data/indexes/ count=0 | rename title AS Index splunk_server AS Indexer currentDBSizeMB AS usage maxTotalDataSizeMB AS size | stats sum(usage) AS usage values(size) AS size by Index, $indexer$ | eval DiskPer=((usage*100)/size) | rename usage as DiskUsage(MB), size AS DiskQuota(MB), DiskPer AS Used(%)

We can report of disk usage based on Indexes only. Sourcetype level data is not available here.

Ref:
1.https://answers.splunk.com/answers/173541/is-there-a-way-to-determine-how-much-disk-space-my.html?ut...
2.https://answers.splunk.com/answers/373506/how-to-generate-storage-and-license-usage-reportin-1.html?...
3.https://answers.splunk.com/answers/374892/does-the-license-master-have-disk-usage-info-from.html?utm...

  1. http://docs.splunk.com/Documentation/Splunk/6.1.4/SearchReference/Dbinspect
  2. http://docs.splunk.com/Documentation/Splunk/latest/RESTREF/RESTintrospect
0 Karma
1 Solution

bkumarm
Contributor

Though late, I thought it would be good to post the resolution:
The new version of DMC in Splunk provides all features I need.
Also I realized that we can monitor these parameters from the Master Node and Deployment server too.

View solution in original post

0 Karma

bkumarm
Contributor

Though late, I thought it would be good to post the resolution:
The new version of DMC in Splunk provides all features I need.
Also I realized that we can monitor these parameters from the Master Node and Deployment server too.

0 Karma

gjanders
SplunkTrust
SplunkTrust

From what I can see their is no possible answer to your question, I would ask why you need to measure sourcetypes inside an index.
Is that for billing purposes? And if so why not just separate the required sourcetypes into different indexes?

You could approximate the number by counting the number of events by sourcetype inside an index and estimating the % size usage inside an index but I don't think you can obtain exact numbers.

You can obtain exact numbers for incoming data and then check the compression ratio and approximate that way...

bkumarm
Contributor

your questions triggered me to look more detail and lead to resolution I posted below.
Thank You !!!

0 Karma

gjanders
SplunkTrust
SplunkTrust

No problem, thanks for following up the question 🙂

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...