Deployment Architecture

What are best practices for long term storage?

ITBlogger
Explorer

Hi All,

We have a regulatory requirement to keep logs for 3 years. I am currently architecting our Splunk infrastructure and trying to figure out how far I should go to protect our logs from corruption, drive/array failures, etc.

In my current design I'm planning to have 6 load balanced indexers (4 in main site and 2 in alternate), each with an external drive enclosure filled with 25 1.2TB drives. Each indexer will be filled with 8 300GB 15k rpm drives. Because of how I'm splitting up the enclosure where 10 drives are in RAID 10 and 14 are in RAID 5 (although I suppose this can be split up into two groups of RAID 5 arrays), we'll have a total of approx 91TB here and 45TB at alternate site.

This is enough storage to store data captured at 100GB per day for 3 years.

This design, of course, only protects us from a certain level of drive failures. It doesn't protect us from log data corruption.

From a regulatory perspective, we get slammed hard if we lose more than a reasonable number of logs due to corruption, but I'm not sure that warrants building out a separate backup network and doubling storage using a SAN in order to protect against the small chance of data corruption.

Is my current design reasonable? Let me know what you think.

Regards,

Alex

Tags (2)
0 Karma

kristian_kolb
Ultra Champion

A little bit off-topic, but could be interesting all the same.

If this is only for regulatory purposes, i.e. you do not really want to keep all those logs online and searchable, not even for an auditor to look at them, you should probably think about aging out your indexes to frozen after some time, say one year. Then you can have the frozen buckets on backup for another 2 years before deleting them permanently.

The main point is that frozen backups only take up around 10-15% (on average) of the original log size, whereas the warm/cold buckets can in some cases be larger than the original logs (average around 50% of original size), because of the .tsidx files that make them searchable. Frozen buckets do not save the .tsidx files, so the events therein are not directly searchable, but the .tsidx files can be rebuilt should the need occur.

As for the risk of tampering/losing logs, you should probably ask your auditor what s/he thinks is acceptable. AFAIK, you can not have data block signing if your index is spread across more than one indexer.

http://docs.splunk.com/Documentation/Splunk/5.0.2/Security/ITDataSigning

hope this helps,

k

ITBlogger
Explorer

Yeah, I saw that too when I hit the link you gave above. Interesting stuff.

0 Karma

kristian_kolb
Ultra Champion

You can still do event hashing I believe. See the docs, right next to the docs page in the link above.

0 Karma

ITBlogger
Explorer

Thanks, that is excellent info to know. Too bad index data signing isn't currently supported with distributed search as that's one of the things that makes Splunk so elegant.

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.