Hunk bucket archive question?

tsunamii · ‎03-22-2016

When HUNK does its bucket pushes to HDFS, it also pushes a couple small supporting files, metadata, etc... With Hadoop's issues handling small files, I was wondering if that is something that's been looked at or not?

For using the HUNK archiving on clustered indexers, I understand that if the buckets have 2 or greater searchable copies of the data, searches against that archived data will return duplicated results, is that correct?

hsesterhenn · ‎03-22-2016

Hi

1) jepp. There will be some overhead. Since there are not millions of buckets I don't think this will be an issue...

2) Every indexer will try to copy "it's own" bucket to HDFS. If there is already a valid copy of this bucket the indexer will skip this.
There should be no duplicates.

HTH,

Holger

Hunk bucket archive question?

Everything Community at .conf24!

Index This | I’m short for "configuration file.” What am I?

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...