All Apps and Add-ons

Is there a way to impliment data integrity control on hadoop archives?

jcampbell1977
Explorer

We are going through some compliance stuff and I need to ensure that our data integrity is true. How would I go about doing this on a virtual index? We are using hadoop to read the data from s3 in AWS.

dwaddle
SplunkTrust
SplunkTrust

You really can't use the Splunk Data Integrity feature with Hadoop virtual index data. Splunk did not process the data through its own indexing pipeline, so you can't use the Data Integrity features of the indexing pipeline. You would have to make your own file hashing and signing solution for HDFS and make sure it crosses all of your compliance check boxes.

0 Karma

jcampbell1977
Explorer

Well, the data did go through splunk first. We are rolling our data off and using AWS s3 as our archiving solution and using analytics for hadoop to read through splunk.
Are you aware of any file hashing components for hdfs?

0 Karma

dwaddle
SplunkTrust
SplunkTrust

OH well that changes it a little bit I guess! If the data went through Splunk's Data Integrity features as it was indexed before it wound up in HDFS, then the hashes made when the bucket rolled from hot to warm should still be valid on the copy of the bucket in s3. The question becomes if there is a way to check those. And I'm honestly not sure.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...