Below is the solution
Once the Hadoop Connect app has been installed on Splunk cluster (HadoopConnect app UI interface attached). We will be doing below validations from CLI to ensure the component ready to use. Both Unix and Windows example has been given for each task.
1) Check the HaddopConnect app is installed
./splunk display app HadoopConnect
[vivek.bitla@ip-172-31-25-223 clusters]$ /home/vivek.bitla/vb/splunk/bin/splunk display app HadoopConnect
HadoopConnect CONFIGURED ENABLED VISIBLE
2) Add the Hadoop cluster.
Add the cluster details and other inputs just like we do on UI into “etc/apps/HadoopConnect/local/clusters.conf “.We need to refresh the Splunk UI session to see them on UI.
[vivek.bitla@ip-172-31-25-223 local]$ cat clusters.conf
hadoop_home = /home/hadoop
java_home = /usr/lib/jvm/jre
uri = hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000
3) Explore the Hadoop cluster.
Once we add the cluster in step (2), we can explore the contents of cluster with below command
[vivek.bitla@ip-172-31-25-223 local]$ /home/vivek.bitla/vb/splunk/bin/splunk search '|hdfs ls hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000/user/vivek.bitla'
4) Manage HDFS Inputs for Indexed HDFS Data by Source.
Add the cluster details to “etc/apps/HadoopConnect/local/inputs.conf”. We need to refresh the splunk UI session to see them on UI.
[vivek.bitla@ip-172-31-25-223 local]$ cat inputs.conf
disabled = 0
5) Run a Splunk search on the source data.
Though am able to add "build export" in export.conf, somehow that isn't working. So implementing in an alternate approach
./splunk search ‘ ’
[vivek.bitla@ip-172-31-25-223 local]$ /home/vivek.bitla/vb/splunk/bin/splunk search 'sourcetype=access_* status=200 action=purchase | top categoryId ' -app HadoopConnect -output csv
6) Export the Splunk search results to HDFS.
We should store the splunk search output into one of the rawdata, table, raw, csv, auto, json formats in a flat file, then with the help of HDFS commands ingest into Hadoop data node.
/home/vivek.bitla/vb/splunk/bin/splunk search 'sourcetype=access_* status=200 action=purchase | top categoryId ' -app HadoopConnect –output csv > /home/vivek.bitla/SplunkOutput/purchase.csv
Hadoop fs –copyFromLocal purchase.csv hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000/user/vivek.bitla/SplunkOutput/
7) Schedule the Splunk Build Export.
We could schedule the Build Export through Crontab (internally splunk use the same) by adding an entry like below and export the data to HDFS.
[vivek.bitla@ip-172-31-25-223 local]$ crontab -l
00,15,30,45 * * * * $Splunk_Exports/purchase_category.sh
**** purchase_category.sh will be having the code to handing search and export.
In addition we can add some more sanity checks whether the ports are open through which Splunk communicates. Also for the read-write-execute permissions on HDFS folders
... View more
how we may implement a Splunk validation check from CLI.
Could any one suggest commands to implement below tasks
Manage HDFS Inputs
search and export
I know that using semi manual approach through Splunk interface is fast and easy, but This solution will require running the check from an OPS-managed Splunk server instance.
Also did any one developed the solution with Splunk API/SDK?
Your prompt response will be highly appreciated.
... View more