Getting Data In

Hunk bucket archive question?

tsunamii
Path Finder

When HUNK does its bucket pushes to HDFS, it also pushes a couple small supporting files, metadata, etc... With Hadoop's issues handling small files, I was wondering if that is something that's been looked at or not?

For using the HUNK archiving on clustered indexers, I understand that if the buckets have 2 or greater searchable copies of the data, searches against that archived data will return duplicated results, is that correct?

0 Karma

hsesterhenn
Path Finder

Hi

1) jepp. There will be some overhead. Since there are not millions of buckets I don't think this will be an issue...

2) Every indexer will try to copy "it's own" bucket to HDFS. If there is already a valid copy of this bucket the indexer will skip this.
There should be no duplicates.

HTH,

Holger

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...