Is there a way to impliment data integrity control...

jcampbell1977 · ‎02-22-2017

We are going through some compliance stuff and I need to ensure that our data integrity is true. How would I go about doing this on a virtual index? We are using hadoop to read the data from s3 in AWS.

dwaddle · ‎02-22-2017

You really can't use the Splunk Data Integrity feature with Hadoop virtual index data. Splunk did not process the data through its own indexing pipeline, so you can't use the Data Integrity features of the indexing pipeline. You would have to make your own file hashing and signing solution for HDFS and make sure it crosses all of your compliance check boxes.

jcampbell1977 · ‎02-23-2017

Well, the data did go through splunk first. We are rolling our data off and using AWS s3 as our archiving solution and using analytics for hadoop to read through splunk.
Are you aware of any file hashing components for hdfs?

dwaddle · ‎02-23-2017

OH well that changes it a little bit I guess! If the data went through Splunk's Data Integrity features as it was indexed before it wound up in HDFS, then the hashes made when the bucket rolled from hot to warm should still be valid on the copy of the bucket in s3. The question becomes if there is a way to check those. And I'm honestly not sure.

Is there a way to impliment data integrity control on hadoop archives?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

Is there a way to impliment data integrity control on hadoop archives?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...