I have created my Hadoop provider and configured my virtual index. However, when I go to search my virtual index I am receiving the following error in the Splunk search window.
"[hadoop_hie_hdfs] BlockMissingException - Could not obtain block: BP-1447578430-10.9.104.12-1453857005466:blk_1075773485_2044567 file=/opsanalytics/snow/Dim_Configuration_Item_CM_Approver/000001_0"
Splunk 6.5
Hadoop CLI 2.2.0
Based on this link, it looks like your Name Node cannot find the blocks:
https://thebipalace.com/2016/05/16/hadoop-error-org-apache-hadoop-hdfs-blockmissingexception-could-n...
Hi @rdagan ,
I'm facing the same error. Moreover Im using a single node cluster in DEV and my block does exit .
2016-12-19 02:30:33,798 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-1826813176-10.109.137.83-1480588903531 Total blocks: 21, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
But still the below error while running query:
BlockMissingException - Could not obtain block: BP-1826813176-10.109.137.83-1480588903531:blk_1073741844_1020 file=/data/input/splunk/linux2/README.txt
I restarted the nodes also. But I still face the error.
@sbrice - Were you able to fix the error?
I assume the data is on your data node, but for some reason your name node cannot access it.
My recommendation is to try to run these from the command line (not using the Splunk UI):
1) the command, hadoop fs -text /data/input/splunk/linux2/README.txt
2) hadoop fs -text hdfs:// your name node : 8020 /data/input/splunk/linux2/README.txt
3) run MapReduce Jobs on this file from the command line, using the Splunk user
Hi @rdagan ,
Thankyou so much for guiding me.
Actually None of the commands worked on the SH command line.
But when I tried on the Hadoop cluster, I was getting below WARNINGS.
So I deleted these corrupt blocks and re-added data to datanodes.
But are you aware of why blocks get corrupt in Hadoop. I know this is out of splunk scope, but asking out of curiosity.
16/12/20 02:21:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/20 02:21:17 INFO hdfs.DFSClient: No node available for BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 file=/data/input/splunk/linux2/LICENSE.txt
16/12/20 02:21:17 INFO hdfs.DFSClient: Could not obtain BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 from any node: java.io.IOException: No live nodes contain block BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 after checking nodes = [], ignoredNodes = null No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
16/12/20 02:21:17 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 127.55817333292474 msec.
16/12/20 02:21:17 INFO hdfs.DFSClient: No node available for BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 file=/data/input/splunk/linux2/LICENSE.txt
16/12/20 02:21:17 INFO hdfs.DFSClient: Could not obtain BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 from any node: java.io.IOException: No live nodes contain block BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 after checking nodes = [], ignoredNodes = null No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
16/12/20 02:21:17 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 7945.315192663103 msec.
16/12/20 02:21:25 INFO hdfs.DFSClient: No node available for BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 file=/data/input/splunk/linux2/LICENSE.txt
16/12/20 02:21:25 INFO hdfs.DFSClient: Could not obtain BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 from any node: java.io.IOException: No live nodes contain block BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 after checking nodes = [], ignoredNodes = null No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
16/12/20 02:21:25 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 6954.636509794202 msec.
16/12/20 02:21:32 WARN hdfs.DFSClient: Could not obtain block: BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 file=/data/input/splunk/linux2/LICENSE.txt No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
16/12/20 02:21:32 WARN hdfs.DFSClient: Could not obtain block: BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 file=/data/input/splunk/linux2/LICENSE.txt No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
16/12/20 02:21:32 WARN hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1826813176-10.109.137.83-1480588903531:blk_1073741842_1018 file=/data/input/splunk/linux2/LICENSE.txt
Since none of these Hadoop commands work, I would recommend you add your Hadoop node using a tool like Ambari or Cloudera Manager. Then add the data and HDFS directories using that same Management tools. Ambari and Cloudera Manager are very good at eliminating many issues when creating a Hadoop cluster.
Here is a link that describes some of the reasons for data corruption in HDFS: http://hadoopinrealworld.com/dealing-with-data-corruption-in-hdfs/
Hi @rdagan ,
That was really informative. Thankyou so much for your help..!! 🙂