This problem was pretty nasty to troubleshoot - from the stacktraces above it seems like the resource localizer is trying to use Kerberos to authenticate against the NameNode. However, it fails (as it should) because it cannot find a keytab file, because there is none. After the job submission there should be no need for the keytab files anymore because at any stage after submission the job must be using Hadoop's delegation tokens. In the above case the Hunk server was properly getting the delegation tokens from the namenode, as can be seen in the following log line:
DEBUG ERP.REDACTED - SecurityUtil - Acquired token Ident: 00 16 ... 05 8b, Kind: HDFS_DELEGATION_TOKEN, Service: [external-ip-address]:8020
However, the delegtion token was for a service provided by the external IP of the Namenode (as can be see by Service: [external-ip-address]:8020) - however the resource localizer communicates with the Namenode using an internal IP ("node1234.redacted.com/10.123.123.123"; destination host is: ""namenode.redacted.com":8020; )
Thus the root cause of the problem was a mismatch between the client's and the cluster's value of hadoop.security.token.service.use_ip . The fix was to set the following flag in the provider
[REDACTED]
....
vix.hadoop.security.token.service.use_ip = false
... View more