All Apps and Add-ons

What block replication factor does Hunk use in its MR jobs? (under replicated blocks)

alexmc
Explorer

I am seeing lots of under replicated blocks in my Hadoop cluster. It's main client is Hunk. (HDP 2.2 and Hunk 6.6.1 I think)
When I do a hdfs fsck / I see that the blocks in question look like they were created by hunk.... eg

/user/splunk/.staging/job_1424956467914_0015/job.jar: Under replicated BP-1255772799-10.34.37.1-1421676659908:blk_1073768329_27512. Target Replicas is 10 but found 5 replica(s).

(I used user splunk before switching on user proxying)

Now what I would like to know is what is setting the target replica count to 10? I want to remove that or manually change it down to 3. I only have 5 data nodes so 10 copies of a block will be impossible.

I can't see anything suitable in the indexes.conf

Tags (1)
0 Karma
1 Solution

apatil_splunk
Splunk Employee
Splunk Employee

Hunk uses the replication factor set in the configurations of hadoop client on the search head.

View solution in original post

lloydd518
Path Finder

Hunk does use the replication factor set in the configurations of hadoop client on the search head, however.... if you haven't set this to match the replication number of your hadoop cluster... your hadoop cluster will grumble about under replicated blocks caused by your Hunk searches because your Hunk searches will probably be requesting an arbritrary number like '10'.

A way to override this and resolve the issue or prevent it in the future is... to add the following line to your virtual index provider settings.

vix.mapreduce.client.submit.file.replication = 1 (or what ever your hadoop cluster replication number is)

0 Karma

apatil_splunk
Splunk Employee
Splunk Employee

Hunk uses the replication factor set in the configurations of hadoop client on the search head.

rdagan_splunk
Splunk Employee
Splunk Employee

Exactly, Hunk is just using the default unless you override these flags in the provider. For example, VIX.yarn.resourcemanager.classpath

0 Karma

alexmc
Explorer

They weren't the issue - but it was good to check, thanks.

Further investigation showed that the problem was NOT limited to hunk created jobs - it seemed to be on some other MapReduce jobs unrelated to Hunk. This is not a Hunk problem.

So - to answer my own question - it uses the default!

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee
0 Karma
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...