All Apps and Add-ons

Is there a way to impliment data integrity control on hadoop archives?

jcampbell1977
Explorer

We are going through some compliance stuff and I need to ensure that our data integrity is true. How would I go about doing this on a virtual index? We are using hadoop to read the data from s3 in AWS.

dwaddle
SplunkTrust
SplunkTrust

You really can't use the Splunk Data Integrity feature with Hadoop virtual index data. Splunk did not process the data through its own indexing pipeline, so you can't use the Data Integrity features of the indexing pipeline. You would have to make your own file hashing and signing solution for HDFS and make sure it crosses all of your compliance check boxes.

0 Karma

jcampbell1977
Explorer

Well, the data did go through splunk first. We are rolling our data off and using AWS s3 as our archiving solution and using analytics for hadoop to read through splunk.
Are you aware of any file hashing components for hdfs?

0 Karma

dwaddle
SplunkTrust
SplunkTrust

OH well that changes it a little bit I guess! If the data went through Splunk's Data Integrity features as it was indexed before it wound up in HDFS, then the hashes made when the bucket rolled from hot to warm should still be valid on the copy of the bucket in s3. The question becomes if there is a way to check those. And I'm honestly not sure.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk Cloud Platform 9.1.2308?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2308! Analysts can ...

Index This | Why do they call it hyper text?

November 2023 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

State of Splunk Careers 2023: Career Resilience and the Continued Value of Splunk

For the past three years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...