From Splunk User Manual for Hadoop App, I found that Splunk is using FileContext to get the metrics from Hadoop services.
When JMX also provides the same metrics, why does Splunk use FileContext?
As FileContext requires parsing of the log data, it may provide poor performance when compared to JMX.
Could you please clarify?
For HadoopOps, we evaluated several approaches for Metrics collection: JMX client, HTTP endpoints, custom SplunkContext, and Hadoop's FileContext. We went with FileContext in version 1.0 based on performance and resource footprint, ease of setup, and stable behavior across versions:
Performance/Footprint - overhead from metrics logs was negligible since forwarders were already processing other log files. Other tradeoffs were inconclusive (e.g. disk space vs. extra process invocation)
Ease of Setup - FileContext was the simplest to set up, especially for admins unfamiliar with JMX stack. Enabling JMX / HTTP required another remote interface to be securely accessed, adding overhead for pilot projects
Stability - FileContext was more stable than other approaches. Hadoop JMX was missing the mbean for "mapred" metrics in early releases, while the HTTP servlet broke in 0.20.203 until it was replaced by a different endpoint in 0.20.205. SplunkContext was ruled out for the same reasons; GangliaContext broke more than once as the metrics system was overhauled between Hadoop Metrics and Hadoop Metrics2.
The assumptions above may have changed since 1.0, and may not apply for your Hadoop operations! The most successful customers use the HadoopOps App as a starting point for configuration; Splunk can easily support metrics collection using any approach you prefer...