Getting Data In

Hunk bucket archive question?

Path Finder

When HUNK does its bucket pushes to HDFS, it also pushes a couple small supporting files, metadata, etc... With Hadoop's issues handling small files, I was wondering if that is something that's been looked at or not?

For using the HUNK archiving on clustered indexers, I understand that if the buckets have 2 or greater searchable copies of the data, searches against that archived data will return duplicated results, is that correct?

0 Karma

Path Finder

Hi

1) jepp. There will be some overhead. Since there are not millions of buckets I don't think this will be an issue...

2) Every indexer will try to copy "it's own" bucket to HDFS. If there is already a valid copy of this bucket the indexer will skip this.
There should be no duplicates.

HTH,

Holger

0 Karma