Solved: Splunk Hadoop Connector Validation through CLI

vivekb1590 · ‎12-12-2013

how we may implement a Splunk validation check from CLI.

Could any one suggest commands to implement below tasks

Manage HDFS Inputs

Add cluster

search and export

I know that using semi manual approach through Splunk interface is fast and easy, but This solution will require running the check from an OPS-managed Splunk server instance.

Also did any one developed the solution with Splunk API/SDK?

Your prompt response will be highly appreciated.

Thanks,

vivekb1590 · ‎12-19-2013

Below is the solution

Once the Hadoop Connect app has been installed on Splunk cluster (HadoopConnect app UI interface attached). We will be doing below validations from CLI to ensure the component ready to use. Both Unix and Windows example has been given for each task.

1) Check the HaddopConnect app is installed

Solution:
./splunk display app HadoopConnect

Example:
[vivek.bitla@ip-172-31-25-223 clusters]$ /home/vivek.bitla/vb/splunk/bin/splunk display app HadoopConnect
HadoopConnect CONFIGURED ENABLED VISIBLE

2) Add the Hadoop cluster.

Solution:
Add the cluster details and other inputs just like we do on UI into “etc/apps/HadoopConnect/local/clusters.conf “.We need to refresh the Splunk UI session to see them on UI.

Example:
[vivek.bitla@ip-172-31-25-223 local]$ cat clusters.conf
[ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000]
hadoop_home = /home/hadoop
java_home = /usr/lib/jvm/jre
uri = hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000

3) Explore the Hadoop cluster.

Solution:
Once we add the cluster in step (2), we can explore the contents of cluster with below command

./splunk search

Example:
[vivek.bitla@ip-172-31-25-223 local]$ /home/vivek.bitla/vb/splunk/bin/splunk search '|hdfs ls hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000/user/vivek.bitla'
/user/vivek.bitla/e1c487eddc877523facf5181911b9e65.1386956774.cursor
/user/vivek.bitla/tutorialdata.zip
/user/vivek.bitla/vb

4) Manage HDFS Inputs for Indexed HDFS Data by Source.

Solution:
Add the cluster details to “etc/apps/HadoopConnect/local/inputs.conf”. We need to refresh the splunk UI session to see them on UI.

Example:
[vivek.bitla@ip-172-31-25-223 local]$ cat inputs.conf
[hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000/user/vivek.bitla]
disabled = 0

5) Run a Splunk search on the source data.

Solution:
Though am able to add "build export" in export.conf, somehow that isn't working. So implementing in an alternate approach
./splunk search ‘

’

Example:
[vivek.bitla@ip-172-31-25-223 local]$ /home/vivek.bitla/vb/splunk/bin/splunk search 'sourcetype=access_* status=200 action=purchase | top categoryId ' -app HadoopConnect -output csv
categoryId,"_tc",count,percent
STRATEGY,2643,806,"30.495649"
ARCADE,2643,493,"18.653046"
TEE,2643,367,"13.885736"
ACCESSORIES,2643,348,"13.166856"
SIMULATION,2643,246,"9.307605"
SHOOTER,2643,245,"9.269769"
SPORTS,2643,138,"5.221339"

6) Export the Splunk search results to HDFS.

Solution:
We should store the splunk search output into one of the rawdata, table, raw, csv, auto, json formats in a flat file, then with the help of HDFS commands ingest into Hadoop data node.

Example:
/home/vivek.bitla/vb/splunk/bin/splunk search 'sourcetype=access_* status=200 action=purchase | top categoryId ' -app HadoopConnect –output csv > /home/vivek.bitla/SplunkOutput/purchase.csv
Hadoop fs –copyFromLocal purchase.csv hdfs://ec2-54-193-49-94.us-west-1.compute.amazonaws.com:9000/user/vivek.bitla/SplunkOutput/

7) Schedule the Splunk Build Export.

Solution:
We could schedule the Build Export through Crontab (internally splunk use the same) by adding an entry like below and export the data to HDFS.

Example:
[vivek.bitla@ip-172-31-25-223 local]$ crontab -l
00,15,30,45 * * * * $Splunk_Exports/purchase_category.sh

**** purchase_category.sh will be having the code to handing search and export.

In addition we can add some more sanity checks whether the ports are open through which Splunk communicates. Also for the read-write-execute permissions on HDFS folders

View solution in original post

vivekb1590 · ‎12-19-2013