About mikejf12

mikejf12 · ‎10-31-2014

I was running hunk from a linux host that didnt have hdfs access to the remote cluster. Rather than try and sort that access out I installed hunk on the cluster and it runs ok.

mikejf12 · ‎10-31-2014

I have installed hunk 6.1.3 onto a Centos 6 Linux host and connected it to a Centos 6 Linux based CDH5 Hadoop cluster. I have installed hunk under /usr/local/hunk and set up my configuration files to pull csv based data off of hdfs. [hadoop@hc2nn system]$ pwd /usr/local/hunk/etc/system [hadoop@hc2nn local]$ cat indexes.conf [provider:cdh5] vix.family = hadoop vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar vix.env.HADOOP_HOME = /usr/lib/hadoop vix.env.JAVA_HOME = /usr/lib/jvm/jre-1.6.0-openjdk.x86_64 vix.fs.default.name = hdfs://hc2nn:8020 vix.splunk.home.hdfs = /user/hadoop/hunk/workdir vix.mapreduce.framework.name = yarn vix.yarn.resourcemanager.address = hc2nn:8032 vix.yarn.resourcemanager.scheduler.address = hc2nn:8030 vix.mapred.job.map.memory.mb = 1024 vix.yarn.app.mapreduce.am.staging-dir = /user vix.splunk.search.recordreader.csv.regex = .csv$ [hadoop@hc2nn local]$ cat props.conf [source::/data/hunk/rdbms/...] REPORT-csvreport = extractcsv [extractcsv] DELIMS="\," FIELDS="year","manufacturer","model","class","engine size","cyclinders","transmission","Fuel Type","fuel_city_l_100km","fuel_hwy_l_100km","fuel_city_mpg","fuel_hwy_mpg","fuel_l_yr","c02_g_km" Hunk is running from the host hc2nn and I can start it as login at http://hc2nn:8000. I can run searches via index=cdh5_vindex and I can select columns to display, create reports and dashboards. What I want to do though is create a report from the search pane that shows minimum co2 emmissions for a particular manufacturer and model. I would then like to limit the output to the top 20 minimum values. Please excuse the mistakes but I understand that I can do something like this index=cdh5_vindex manufacturer model c02_g_km | stats min(c02_g_km) as minco2 | table manufacturer model minco2 I know that this isnt the correct format but I wondered whether someone could advise the correct approach. I can create single column reports and dashboards but I would like to create something a little more complicated.

mikejf12 · ‎10-26-2014

I have installed splunk today on both a windows 7 64 server and a centos linux 32 bit machine. I have also installed installed version 122 of hadoop connect app from splunk as a tgz file. I want to connect to a CDH5 Centos linux based hadoop cluster built with the CDH5 manager. I have set my hdfs uri, my java home and hadoop home as well as the name node htt port. When I click save on both windows and linux I get the error Unable to connect to Hadoop cluster 'hdfs://hc2nn:8020/' with principal 'None': Invalid HADOOP_HOME. Cannot find Hadoop command under bin directory HADOOP_HOME='/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12'. I can see the hadoop command within the $HADOOP_HOME/bin directory and I have checked that the command works i.e. I can connect to hdfs and do a listing. I wondered whether anyone had seen this error before ?

Posts	3
Solutions	0
Karma Given	0
Karma Received	0
Member Since	‎10-26-2014

Online Status	Offline
Date Last Visited	‎06-05-2020 02:04 AM

Hunk min value multi column search question

hadoop connect error when connecting to CDH5 cento...

Re: hadoop connect error when connecting to CDH5 c...

Hunk min value multi column search question

hadoop connect error when connecting to CDH5 cento...

Join the Conversation

Hunk min value multi column search question

hadoop connect error when connecting to CDH5 cento...

Re: hadoop connect error when connecting to CDH5 c...

Hunk min value multi column search question

hadoop connect error when connecting to CDH5 cento...