Below is the solution
Once the Hadoop Connect app has been installed on Splunk cluster (HadoopConnect app UI interface attached). We will be doing below validations from CLI to ensure the component ready to use. Both Unix and Windows example has been given for each task.
1) Check the HaddopConnect app is installed
Solution:
./splunk display app HadoopConnect
Example:
[vivek.bitla@ip-172-31-25-223 clusters]$ /home/vivek.bitla/vb/splunk/bin/splunk display app HadoopConnect
HadoopConnect CONFIGURED ENABLED VISIBLE
2) Add the Hadoop cluster.
Solution:
Add the cluster details and other inputs just like we do on UI into “etc/apps/HadoopConnect/local/clusters.conf “.We need to refresh the Splunk UI session to see them on UI.
Example:
[vivek.bitla@ip-172-31-25-223 local]$ cat clusters.conf
[ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000]
hadoop_home = /home/hadoop
java_home = /usr/lib/jvm/jre
uri = hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000
3) Explore the Hadoop cluster.
Solution:
Once we add the cluster in step (2), we can explore the contents of cluster with below command
./splunk search
Example:
[vivek.bitla@ip-172-31-25-223 local]$ /home/vivek.bitla/vb/splunk/bin/splunk search '|hdfs ls hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000/user/vivek.bitla'
/user/vivek.bitla/e1c487eddc877523facf5181911b9e65.1386956774.cursor
/user/vivek.bitla/tutorialdata.zip
/user/vivek.bitla/vb
4) Manage HDFS Inputs for Indexed HDFS Data by Source.
Solution:
Add the cluster details to “etc/apps/HadoopConnect/local/inputs.conf”. We need to refresh the splunk UI session to see them on UI.
Example:
[vivek.bitla@ip-172-31-25-223 local]$ cat inputs.conf
[hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000/user/vivek.bitla]
disabled = 0
5) Run a Splunk search on the source data.
Solution:
Though am able to add "build export" in export.conf, somehow that isn't working. So implementing in an alternate approach
./splunk search ‘ ’
Example:
[vivek.bitla@ip-172-31-25-223 local]$ /home/vivek.bitla/vb/splunk/bin/splunk search 'sourcetype=access_* status=200 action=purchase | top categoryId ' -app HadoopConnect -output csv
categoryId,"_tc",count,percent
STRATEGY,2643,806,"30.495649"
ARCADE,2643,493,"18.653046"
TEE,2643,367,"13.885736"
ACCESSORIES,2643,348,"13.166856"
SIMULATION,2643,246,"9.307605"
SHOOTER,2643,245,"9.269769"
SPORTS,2643,138,"5.221339"
6) Export the Splunk search results to HDFS.
Solution:
We should store the splunk search output into one of the rawdata, table, raw, csv, auto, json formats in a flat file, then with the help of HDFS commands ingest into Hadoop data node.
Example:
/home/vivek.bitla/vb/splunk/bin/splunk search 'sourcetype=access_* status=200 action=purchase | top categoryId ' -app HadoopConnect –output csv > /home/vivek.bitla/SplunkOutput/purchase.csv
Hadoop fs –copyFromLocal purchase.csv hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000/user/vivek.bitla/SplunkOutput/
7) Schedule the Splunk Build Export.
Solution:
We could schedule the Build Export through Crontab (internally splunk use the same) by adding an entry like below and export the data to HDFS.
Example:
[vivek.bitla@ip-172-31-25-223 local]$ crontab -l
00,15,30,45 * * * * $Splunk_Exports/purchase_category.sh
**** purchase_category.sh will be having the code to handing search and export.
In addition we can add some more sanity checks whether the ports are open through which Splunk communicates. Also for the read-write-execute permissions on HDFS folders
... View more