Hi Team,
Please note - No Admin privilege to run query on _internal index
I want to calculate the amount of data ingested into splunk to evaluate licensing/disk space needs.
I have two queries which I ran for 24 hours and for 30 days and in both I got different outputs with alot of difference.
Query 1:
index=* | eval size=len(_raw) | eval gbsize=(size/1024/1024/1024) | stats sum(gbsize) by index
Query 2:
| dbinspect index=*|eval sizeOnDiskGB=(sizeOnDiskMB/1024) | stats sum(rawSize) AS rawTotal, sum(sizeOnDiskGB) AS diskTotalinGB by index|sort -diskTotalinGB
I see a difference of around 3-4 GB for some indexes and presume one of these queries or even both of these queries might not be correct.
Can anyone kindly suggest which one is the correct one to use. I can't use _internal as mentioned above.
Thanks in Advance!
Regards,
Abhishek Singh
In my testing, the first query more closely matches the result from the internal index. Of course, it does not work at all on metric indexes.
Query1 is more useful for your licensing needs since it shows you the amount of data being ingested
Query2 is more useful for disk space forecast since this calculates the disk utilized after replication and field extractions are done.
Thank you @arjunpkishore5 !
I used these two queries :
|eventcount summarize=false index=* report_size=true | eval GB=(size_bytes/1024)/(1024*1024) | stats sum(GB) as total by index|sort -total
And
| tstats earliest(_indextime) AS indexing_time where index=* OR index=_* by index|convert ctime(indexing_time) timeformat="%m/%d/%Y"
The first query gave me all the disk space event logs would take (without a replication factor). I checked and found out that the replication factor is 3.
Multiplied the values with 3 and got the total disk space required by each index (With replication).
Now, I took the second query and got the total number of days from which the first indexing happened for an indexer.
I got the total number of days for which indexers are being used by by substracting with current date.
Now, I took the average disk required by dividing the two outputs and estimated the disk spaces for the next 3months, 6 months, 9 months and 12 months.
Hope the way I took is ok.
Regards,
Abhishek Singh
In my testing, the first query more closely matches the result from the internal index. Of course, it does not work at all on metric indexes.
Thanks @richgalloway !
I asked a guy to check the cloud monitoring app and it seems that the Query 1 is closer to the ingested volume of data that is shown in the app.
I have a follow up question. If I need to calculate how much disk space will be required at server level, what all factors do I need to check. I believe that there is something as a multiplication factor(for replication of data) which determines the space taken by the data. What all other factors do I need to take care before coming to a conclusion?