Re: Alternatives to Rehydrating Logs

durnan13

Hello Everyone!

We have what we have been told is not a complete ideal setup where we have searchable data for 90 days and retain archived data for a year. We commonly have troubleshooting efforts that occur or come up where the issue happened 6 months ago (outside our searchable data). This forces us to then do a log rehydration to get the data to be searchable out of our archives. There is a lot of down sides to this.

1. We only can submit those as admins (which is for the best I know)
2. They can take 6+ hours for the data to restore before anyone can proceed with their troubleshooting
3. We are limited to how much data we can restore at a time.

Right now, we are hitting a situation where we have a large investigations going on which required rehydrating a lot of data and it now is preventing the rehydration of any other data for smaller investigations. We can't clear the larger data due to the investigation going on.

We thought about looking at S3 Federated searches instead of going out to archives and dealing with log rehydration's. However, there is concerns this wouldn't really get us where we need to be either.

We would love to look at other options. What are others doing to help in this process? How are you handling investigations outside your normal searchable date ranges?

livehybrid

Hi @durnan13

So...I have had this issue in the past with a previous customer - we ultimately resolved this with a potentially controversial and unsupported technique, which was to use the 'coldToFrozenScript' setting to run a script which would convert the bucket into a smartstore bucket and upload it to S3 storage to an index which is setup for smartstore storage.

Now..the new buckets wont be detected immediately because the CM needs to be made aware of them but periodic restart should sort that. There is a community post at https://community.splunk.com/t5/Knowledge-Management/smartstore-How-to-map-S2-smartstore-buckets-to-... which covers how the smartstore folder structure maps to local file structures and other than the file structure path the only other thing between local/smartstore buckets I think is the receipt.json which is basically a list of the files and their size.

This kind of approach can end up being a good compromise between frozen storage which is slow/hard to recover versus something which might be more complex but can allow you to search historic data at a moments notice.

Obviously there are a number of caveats with this approach, not to mention the support angle etc but it is feasible if quick access to 'archived' data is something you need. I would recommend reaching out to a local partner to see if this sort of thing is something they could help with.

Of course, there is an alternative which would be to increase you retention to cover the maximum time you will ever need access to - this would require enough disk space to cover the historic data...

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

durnan13

We are actually being told our retention is beyond normal standards. Currently our Splunk reps are telling us we really should only be keeping data around for something like 7 to 14 days searchable and a lot shorter frozen storage point of view.

In my mind, if we had a bigger investigation that need to be done on frozen storage as we have it, something like restoring data to a summary index of sorts would be better than rehydrating into our standard indexes. However, I haven't personally done a lot of work with that. I think then smaller request wouldn't be so bad. However, that 6 hour wait time is rough and can definitely slow down research efforts.

livehybrid

Hi @durnan13

I would say 14 days is nowhere near too long… I have a customer with loads of data in a 5 year retention and many indexes in 1-3 years. It does cost more obviously for disks but you could architect it well to have cheaper disks for cold buckets and that way you can benefit from instant retrieval but searches may take a little longer.
Or you could look at having those indexes in smart store. Again this would depend on your setup/hosting/architecture etc.

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful

Marking it as the solution if it resolved your issue

Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing.

livehybrid

Sorry I missed the point re Splunk Cloud - If you're using DDAA then yes you will have delays in accessing this data. The best "fix" is to pay for more storage and keep the data in DDAS (Active Searchable) for longer.

Related to the previously mentioned smartstore stuff - if you send data to DDSS you can thaw it out but this requires additional infra and it wont necessarily (unless you use Federated Search to on-prem instance) allow searching from Splunk Cloud. See https://github.com/livehybrid/ddss-restore for more info on how this can be achieved but it isnt for the faint of heart...

🌟 Did this answer help you? If so, please consider:

Adding karma to show it was useful
Marking it as the solution if it resolved your issue
Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

PickleRick

From the hints in the original question I assumed that we're talking about Cloud.

durnan13

Yes we are using Splunk Cloud today.

isoutamo

And probably AWS based, not GCP either Azure?
And your storage is only DDAA or also DDSS?

durnan13

Yes, we are AWS and only using DDAA currently.

I am really interested in Federated Search if we could migrate perhaps some indexes to an S3 and just search that system from Splunk instead of retaining all indexes for so long. However, I am not sure if that is truly possible or not to do.

PickleRick

Federated Search for Amazon S3 is a bit different thing than "normal" Federated search. That's first important thing. So if you're just saying "federated search" the typical case of searching "remote" Splunk instances is implied.

So what you're talking about is this - https://help.splunk.com/en/splunk-cloud-platform/get-started/splunk-validated-architectures/splunk-p...

Read:

- Limitations

- Requirements

- Licensing

Generally, yes, it's possible to store your data in S3 buckets and search them from Splunk (as long as the requirements from the above documents are met) but you have to be very careful about the searches since you're paying for the searched data (not the results of the whole search! all data that has been scanned to yield the result)

PickleRick

Thawing is an annoying process even on prem (and with on-prem installation you have he element of having to find right buckets to thaw) so it's understandable that you might want to keep your data around as searchable for longer. But of course more data means more storage means more $$$.

PickleRick

Interesting idea. I must admit I haven't dug into it deeply yet but the nagging question is if the buckets were already to be frozen, wouldn't CM re-freeze them again as soon as it saw them? (assuming they were frozen due to time constraints, not size limits).

Alternatives to Rehydrating Logs

data

other

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

May 2026 Splunk Expert Sessions: Security & Observability

Network to App: Observability Unlocked [May & June Series]

Join the Conversation