I have installed hunk 6.1.3 onto a Centos 6 Linux host and connected it to a Centos 6 Linux based CDH5 Hadoop cluster.
I have installed hunk under /usr/local/hunk and set up my configuration files to pull csv based data off of hdfs.
[hadoop@hc2nn system]$ pwd
[hadoop@hc2nn local]$ cat indexes.conf
vix.family = hadoop
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /usr/lib/hadoop
vix.env.JAVA_HOME = /usr/lib/jvm/jre-1.6.0-openjdk.x86_64
vix.fs.default.name = hdfs://hc2nn:8020
vix.splunk.home.hdfs = /user/hadoop/hunk/workdir
vix.mapreduce.framework.name = yarn
vix.yarn.resourcemanager.address = hc2nn:8032
vix.yarn.resourcemanager.scheduler.address = hc2nn:8030
vix.mapred.job.map.memory.mb = 1024
vix.yarn.app.mapreduce.am.staging-dir = /user
vix.splunk.search.recordreader.csv.regex = .csv$
[hadoop@hc2nn local]$ cat props.conf
REPORT-csvreport = extractcsv
Hunk is running from the host hc2nn and I can start it as login at http://hc2nn:8000. I can run searches via
and I can select columns to display, create reports and dashboards. What I want to do though is create a report
from the search pane that shows minimum co2 emmissions for a particular manufacturer and model. I would then like
to limit the output to the top 20 minimum values.
Please excuse the mistakes but I understand that I can do something like this
index=cdh5_vindex manufacturer model c02_g_km | stats min(c02_g_km) as minco2 | table manufacturer model minco2
I know that this isnt the correct format but I wondered whether someone could advise the correct approach. I can create
single column reports and dashboards but I would like to create something a little more complicated.
... View more
I have installed splunk today on both a windows 7 64 server and a centos linux 32 bit machine. I have also installed installed version 122 of hadoop connect app from splunk as a tgz file.
I want to connect to a CDH5 Centos linux based hadoop cluster built with the CDH5 manager. I have set my hdfs uri, my java home and hadoop home as well as the name node htt port. When I click save on both windows and linux I get the error
Unable to connect to Hadoop cluster 'hdfs://hc2nn:8020/' with principal 'None': Invalid HADOOP_HOME. Cannot find Hadoop command under bin directory HADOOP_HOME='/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12'.
I can see the hadoop command within the $HADOOP_HOME/bin directory and I have checked that the command works i.e. I can connect to hdfs and do a listing. I wondered whether anyone had seen this error before ?
... View more