How to monitor my hadoop cluster CDP 7.1.3 with Sp...

cuian01 · ‎10-13-2020

Dear All,

I'm very new to Splunk!

In my organization, Splunk Enterprise was deployed and the management want to monitor all the data platforms, applications in Splunk.

Lately, I have deployed Cloudera CDP 7.1.3 in our data center. Management is expecting Splunk to analyze Hadoop Log files. How to use Splunk to proactively monitor the user activities, service logs and server logs in CDP 7.1.3? Is there any additional component required?

Appreciate if you can share your knowledge on it!

Thanks

inventsekar · ‎10-13-2020

Hi @cuian01 check these 2 apps please:

https://splunkbase.splunk.com/app/3134/

The Hadoop Monitoring Add-on allows a Splunk software administrator to collect Yarn and Hadoop log files as well as Hadoop nodes OS matrix. The App was tested with Hortonworks, Cloudera, and MapR distributions. After the Splunk platform indexes the events, you can analyze the data by building searches and dashboards. The add-on includes few sample prebuilt dashboard panels and reports.

https://splunkbase.splunk.com/app/1180/

Splunk Hadoop Connect provides bi-directional integration to easily and reliably move data between Splunk and Hadoop.

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !

cuian01 · ‎10-13-2020

@inventsekar ,

Thanks for your swift reply!

Actually, I checked the "Hadoop Monitor" app before. But the sample links are all to Hortonworks. With Cloudera & Hortonworks merged together, does "Hadoop Monitor" support latest CDP 7.1.3 release?

rdagan_splunk · ‎11-20-2020

The Cloudera specific log location should be here:

### Cloudera Yarn Log Files

[monitor:///var/log/hadoop-yarn/*nodemanager*]
sourcetype = hadoop_nodemanager
index = hadoopmon_metrics

[monitor:///var/log/hadoop-yarn/*resourcemanager*]
sourcetype = hadoop_resourcemanager
index = hadoopmon_metrics

[monitor:///var/log/hadoop-yarn/*proxyserver*]
sourcetype = hadoop_proxyserver
index = hadoopmon_metrics

[monitor:///var/log/hadoop-mapreduce/*historyserver*]
sourcetype = hadoop_historyserver
index = hadoopmon_metric

### Cloudera Hadoop Log Files

[monitor:///var/log/hadoop-hdfs/*datanode*]
sourcetype = hadoop_datanode
index = hadoopmon_metrics

[monitor:///var/log/hadoop-hdfs/*namenode*]
sourcetype = hadoop_namenode
index = hadoopmon_metrics

[monitor:///var/log/hadoop-hdfs/*secondarynamenode*]
sourcetype = hadoop_secndarynamenode
index = hadoopmon_metrics

[monitor:///var/log/hadoop-hdfs/*journalnode*]
sourcetype = hadoop_journalnode
index = hadoopmon_metrics

### Cloudera Configuration Files

[monitor:///etc/hadoop/conf/*]
crcSalt = <SOURCE>
disabled = 0
sourcetype = hadoop_global_conf
index = hadoopmon_configs

And after you collect the logs you can run searches similar to these:

[Yarn All Applications]
index=hadoopmon_metrics sourcetype=hadoop_resourcemanager appId=* | eval elapsed_time = finishTime - startTime | table appId name user queue finalStatus elapsed_time

[Yarn Top User]
index=hadoopmon_metrics sourcetype=hadoop_resourcemanager appId=* | top user

[Yarn Success Rate]
index=hadoopmon_metrics sourcetype=hadoop_resourcemanager appId=* | top finalStatus

How to monitor my hadoop cluster CDP 7.1.3 with Splunk?

using Splunk Enterprise

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases

Are you a member of the Splunk Community?

How to monitor my hadoop cluster CDP 7.1.3 with Splunk?

using Splunk Enterprise

Building Reliable Asset and Identity Frameworks in Splunk ES

Cloud Monitoring Console - Unlocking Greater Visibility in SVC Usage Reporting

Automatic Discovery Part 3: Practical Use Cases