Splunk Search

Calculate Index Size increase 24 hours. No admin privilege.

asing13
Explorer

Hi Team,

Please note - No Admin privilege to run query on _internal index

I want to calculate the amount of data ingested into splunk to evaluate licensing/disk space needs.

I have two queries which I ran for 24 hours and for 30 days and in both I got different outputs with alot of difference.

Query 1:
index=* | eval size=len(_raw) | eval gbsize=(size/1024/1024/1024) | stats sum(gbsize) by index

Query 2:
| dbinspect index=*|eval sizeOnDiskGB=(sizeOnDiskMB/1024) | stats sum(rawSize) AS rawTotal, sum(sizeOnDiskGB) AS diskTotalinGB by index|sort -diskTotalinGB

I see a difference of around 3-4 GB for some indexes and presume one of these queries or even both of these queries might not be correct.
Can anyone kindly suggest which one is the correct one to use. I can't use _internal as mentioned above.

Thanks in Advance!

Regards,
Abhishek Singh

Labels (1)
1 Solution

richgalloway
SplunkTrust
SplunkTrust

In my testing, the first query more closely matches the result from the internal index.  Of course, it does not work at all on metric indexes.

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

arjunpkishore5
Motivator

Query1 is more useful for your licensing needs since it shows you the amount of data being ingested

Query2 is more useful for disk space forecast since this calculates the disk utilized after replication and field extractions are done. 

asing13
Explorer

Thank you @arjunpkishore5 !

I used these two queries :

|eventcount summarize=false index=* report_size=true | eval GB=(size_bytes/1024)/(1024*1024) | stats sum(GB) as total by index|sort -total

And

| tstats earliest(_indextime) AS indexing_time where index=* OR index=_* by index|convert ctime(indexing_time) timeformat="%m/%d/%Y"

The first query gave me all the disk space event logs would take (without a replication factor). I checked and found out that the replication factor is 3.
Multiplied the values with 3 and got the total disk space required by each index (With replication).

Now, I took the second query and got the total number of days from which the first indexing happened for an indexer.

I got the total number of days for which indexers are being used by by substracting with current date.

 

Now, I took the average disk required by dividing the two outputs and estimated the disk spaces for the next 3months, 6 months, 9 months and 12 months.

Hope the way I took is ok.

Regards,
Abhishek Singh

richgalloway
SplunkTrust
SplunkTrust

In my testing, the first query more closely matches the result from the internal index.  Of course, it does not work at all on metric indexes.

---
If this reply helps you, an upvote would be appreciated.

View solution in original post

asing13
Explorer

Thanks @richgalloway  !
I asked a guy to check the cloud monitoring app and it seems that the Query 1 is closer to the ingested volume of data that is shown in the app.

I have a follow up question. If I need to calculate how much disk space will be required at server level, what all factors do I need to check. I believe that there is something as a multiplication factor(for replication of data) which determines the space taken by the data. What all other factors do I need to take care before coming to a conclusion?

Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!