Solved: Splunk with Archive Solution

sscholz · ‎02-01-2021

Hello Community,

I have to build a temper-proof archive solution with data ingested in splunk. The last couple days I thought about it and I would appreciate your ideas or at best a known/experienced Best Practice advice.

The idea behind this is to forward or store splunk indexed data temper-proof (and non deleteable), so that I can be sure the data CAN NOT be altered anymore.

Recently I build this with a indexer forwarding to a syslog-server (syslog-format), the data then is copied to a WORM-Storage. But I am not convinced that this solution is the ideal one. It works, but there are a few to much "error-sources" in the chain.

The other idea is to use the data integrity function to ensure, that the data is not altered and still valid. If Iam right, the indexed data can only be deleted but not altered? I am also convied of this idea, because I had to handle the checksum files and this could be a lot with 250GB indexed data per day.

In sum there are two ideas:

Target: temper-proof/non-deleteable data from indexed events // a goodie would be a fully seured transport of the data

1. IDX Forward (syslog-format) -> Syslog-Server -> Copy to WORM-Storage

2. Use data integrity function -> Store Checksums in WORM-Storage, because the data itself can only be deleted.

I hope some of you built such a archive solution in the past and can help me out.

BR, Tom

isoutamo · ‎02-01-2021

Hi

Some comments and my own opinions.

IMHO: as long as data can be accessed / modified somehow from servers/network I don't call it as archive. Event those files in splunk warm and cold data can be edited from os level if someone really wants. Also those checksum files can edited unless those are in WORM.I think that both of your options are ok. Both have their pros and cons.

1) you could later index/use that data even other tools than splunk. Other side this generates additional requirements and needs that this system is working all time or otherwise your splunk will stop.

2) This need some scripting to get those checksum files to stored into WORM as soon as those are created. Actually this creation can do automatically by splunk (see indexes.conf / enableDataIntegrityControl + singntool)

Third option is use real archiving system to store needed events outside of splunk.

r. Ismo

View solution in original post

scelikok · ‎02-02-2021

Hi @sscholz,

I want to make an addition to @isoutamo comments regarding required disk space for checksum files.

Splunk stores a hash which is of 32 bytes in length for every slices. Default slice size of 128 KB.
So for 250GB/day ingesting you will need about 62.5 MB daily for checksums. You should also multiple this size by replication factor.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

isoutamo · ‎02-01-2021

Hi

Some comments and my own opinions.

IMHO: as long as data can be accessed / modified somehow from servers/network I don't call it as archive. Event those files in splunk warm and cold data can be edited from os level if someone really wants. Also those checksum files can edited unless those are in WORM.I think that both of your options are ok. Both have their pros and cons.

1) you could later index/use that data even other tools than splunk. Other side this generates additional requirements and needs that this system is working all time or otherwise your splunk will stop.

2) This need some scripting to get those checksum files to stored into WORM as soon as those are created. Actually this creation can do automatically by splunk (see indexes.conf / enableDataIntegrityControl + singntool)

Third option is use real archiving system to store needed events outside of splunk.

r. Ismo

sscholz · ‎02-08-2021

Thank you.

So I think I had to stick to my syslog solution. 😕

BR, Tom

Splunk with Archive Solution

configuration

using Splunk Enterprise

Unlock New Opportunities with Splunk Education: Explore Our Latest Courses!

Technical Workshop Series: Splunk Data Management and SPL2 | Register here!

Spotting Financial Fraud in the Haystack: A Guide to Behavioral Analytics with Splunk