All Apps and Add-ons

How to import data from Hadoop to Splunk?

eylonronen
Explorer

Hi, I would like anyone to help me find a real decent way of importing data from Hadoop to Splunk.
The methods we have found are the following:
1. Use Nifi to get the files from Hadoop, and then use the put Splunk processor to index them. I really doubt this method, because it seems like a bit overkill for a simple action.
2. Use NFS gateway to mount the hdfs on the forwarder and then use regular monitoring input. This is what we've been doing so far, however, we are looking to replace it because Hadoop's nfs is problematic, and also it is not as fast as we need it to be.
3. Hadoop Connect. A product by Splunk that really got our hopes up, and when we tested it, it showed better performance dramatically than the nfs solution on a single file, however, it was slower than the first with many small files(as we have in our production environment). Also, Hadoop connect is a modular input, and as such, it doesn't support indexing csv files, so I had to dive into the code and alter it to parse csv files to key-value pairs so they will be indexed. It still showed the same performance difference after the changes I've made.

As of now, my wish is to understand Hadoop Connect's poor performance and enhance it. So if anyone can help me with this, or by giving me another indexing method this will be much appreciated.

Thank you very very much 🙂

rdagan_splunk
Splunk Employee
Splunk Employee

Few ideas:
1) Using Nifi, but with InvokeHttp instead of putSplunk. Here is how to do it: https://www.youtube.com/watch?v=Dq9qKU9HZYM&t=25s
2) Use Hadoop Connect, but mount Hadoop instead of using the normal name node. Here is how to do it: http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/Configuretheapp#Map_to_...

0 Karma

stamstam
Explorer

We've actually tried mounting hadoop and using Hadoop Connect with locally mounted FS, but it still lost to the remote cluster function of Hadoop Connect.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

For me, using MapR distribution, it made a big difference as far as performance.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...