Deployment Architecture

Why are my thawing frozen buckets not restoring as expected?

dschmidt_cfi
Path Finder

Running Splunk v.7.0.2 in a distributed environment with 3 clustered indexers. Trying to restore frozen data to my stand-alone test environment. As a test — I recovered two different db_ buckets from tape and put them into the new thawed_test index's thaweddb directory and then rebuilt them. What it appears I am getting is logs of logs. Yes, under the /web_access.log was web logs, but not the originals but rather how Splunk saw them on indexing.

I may have done something wrong in the initial archiving, or am missing something, but I need the original records, not a view of what they may have been.

0 Karma
1 Solution

dschmidt_cfi
Path Finder

PS originally set this up at the end of 2014 and it was dropped in my lap. Until this exercise I did not realize this error of sorts. Everything frozen was just dumped into /opt/splunk/var/lib/frozen without the parent (index) directory. I need to correct this going forward

View solution in original post

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Yeah, getting all data and then estimating index based off source/sourcetype/host values should give you what you need, albeit quite manually.

0 Karma

dschmidt_cfi
Path Finder

PS originally set this up at the end of 2014 and it was dropped in my lap. Until this exercise I did not realize this error of sorts. Everything frozen was just dumped into /opt/splunk/var/lib/frozen without the parent (index) directory. I need to correct this going forward

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Indeed, the parent folders of the bucket is extremely important. As far as I know a bucket doesn't know what index it belongs to, other than by where it is stored in the directory tree.

0 Karma

dschmidt_cfi
Path Finder

Well, I guess we can close this one and I will have to hand in my NOob card. PS originally set this up at the end of 2014 and it was dropped in my lap. Until this exercise I did not realize this error of sorts. Everything frozen was just dumped into /opt/splunk/var/lib/frozen without the parent (index) directory. At least there are only about 65 or so indexes to fix at this point. I can do that while I am moving indexes out of main as part of my cleanup 2.0.

Don't know if dumping everything I need between the epoch goalposts will get what I need or not but I do want to thank everyone for showing me the way forward. I will make sure to ++ you all in Slack if bot is working. Now to start clawing my way back up to NOob status.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Can you provide some more details without publishing sensitive info?

If you need fast in-depth help on an urgent recovery matter I'd recommend support, PS, or your local Splunk Partner to help you rather than answers - they can sign NDAs, touch your actual systems, etc.
If you're just testing or learning stuff then carry on 🙂

0 Karma

dschmidt_cfi
Path Finder

Basically, this is both a learning and a recovery (if possible) operation. Never done this before. Our / My process is when the frozen buckets area goes above 80% on any of the indexers I get a notice. Gives me a couple of days to respond. I wrote a Python script that gets a listing of all frozen buckets oldest to newest and at present I am archiving to tape 60 Gb worth at a shot.

On the first test I grabbed one at random to restore from tape to my Sandbox...
db_1533439042_1533265909_223943_66C8CECB-2830-4A2B-BB6E-92F896DB305F
after creating the required sub-folders in the thaweddb directory in my new test index.

What I am getting when I search the information at an index level is logs not individual records.

/opt/splunk/var/log/splunk/apifilesave.log
/opt/splunk/var/log/splunk/app_imports_update.log
/opt/splunk/var/log/splunk/app_permissions_manager.log
/opt/splunk/var/log/splunk/configuration_check.log

These contain information but more so as the indexer sees it. The only hosts that are listed are my Splunk servers and no outside servers like the web servers.

Hope this helps.

0 Karma

FrankVl
Ultra Champion

And you are sure you are restoring a bucket that belonged to an index that actually contained logs from your web servers and not that you are restoring some random bucket that just happens to contain splunk's internal logs?

0 Karma

dschmidt_cfi
Path Finder

i must be the "luckiest" person ever and should be flipping quarters for a living because I manged to pull two buckets of internal logs from different time spans. It is a good question and is there anyway of telling what is internal to bucket. I am looking for a specific output (WebSphere System Out) which is part of a specific index (was) within a time frame.

Everything I see is basically the standard path /opt/splunk/var/lib/frozen//rawdata/journal.gz with no real indicator of contents. I have be scripting to select those primary buckets that fall withing either the earliest or latest epoch I need. Even with both buckets now on-boarded the hosts list is still just the 3 SH, the DMC and this indexer NDX2.

I need to rewrite the early 2015 Python script now that I understand Splunk better but that is water under the bridge at this point.

0 Karma

FrankVl
Ultra Champion

I'm not an expert on this stuff, but buckets are usually stored in a folder structure that follows your index structure, right? So that should provide a way to determine what index a bucket belonged to. Did you not retain that folder structure when archiving the frozen buckets?

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...