Archive

How to calculate the amount of data being archived by Splunk?

Communicator

Hi I have a distributed setup of splunk in Amazon AWS and I have retention policies in place. I am archiving the old data to Amazon's S3 storage.

My question is, how can I calculate the volume of data being archived from splunk to S3 on daily basis? Is there any search string using the _internal index or |metadata that I can use? I prefer doing it using Splunk itself.

0 Karma

Splunk Employee
Splunk Employee

How are you archiving this to S3?

0 Karma

Communicator

I have a script that can archive the frozen data to S3.

We used this Atlassian script and customized it to our need.

0 Karma

Splunk Employee
Splunk Employee

Im not familiar with their script.. But in general there will be no internal metrics on external scripts that copy data in this way.

I'd recommend adding metrics calcuations into the script, and then log that to Splunk. I'd look at something like writing a log file out based on the results of the copy script, and in that include timestamp, files copied, copy xfer rate and size of the data copied..

Ingest that into Splunk, and you can now monitor whats been archived, how much has been archived, and to where...

Communicator

This script will only move data from Splunk to S3, it does not know what and how much data is being moved. The Splunk Enterprise decides what and how much data to be moved so my thought is splunk knows exactly the volume of data it is moving.

0 Karma

Splunk Employee
Splunk Employee

In a way this is correct, and not correct.

The bucket rolling/freezing process doesnt care about the size of buckets, only the age of the data inside the buckets. Once the bucketmover is tasked for freezing the bucket, it calls the script. It doesnt pass bucket size or contents, roughly just the bucket path.. So here, Splunk doesnt know about the size of the bucket. You would need to use dbinspect before these buckets are rolled to find out the size on disk, store that somewhere, and then correlate the buckets copied to their size as listed in that lookup/kvstore.

Check out this answers post on what rolling to frozen looks like from Splunk :
https://answers.splunk.com/answers/117988/halp-my-data-is-being-rolled-to-frozen-and-i-dont-know-why...

Again though, there is no logging from 3rd party frozen scripts. So if you want to know what the script itself is doing, you should add logging to that script and then ingest that into Splunk.

Communicator

Yes, I am clear about it now. I understood Splunk will not care about the data it is removing, Though we think it is doing. It is better we make changes to our script and monitor that in some way.

Thanks for your quick response and clarifying things. Upvoting your answers.

0 Karma

Splunk Employee
Splunk Employee

On a side note, Atlassian is very open to feedback. I'd push this as a request on their github project page. Perhaps they'll add it!