All Apps and Add-ons

How to import data from Hadoop to Splunk?

eylonronen
Explorer

Hi, I would like anyone to help me find a real decent way of importing data from Hadoop to Splunk.
The methods we have found are the following:
1. Use Nifi to get the files from Hadoop, and then use the put Splunk processor to index them. I really doubt this method, because it seems like a bit overkill for a simple action.
2. Use NFS gateway to mount the hdfs on the forwarder and then use regular monitoring input. This is what we've been doing so far, however, we are looking to replace it because Hadoop's nfs is problematic, and also it is not as fast as we need it to be.
3. Hadoop Connect. A product by Splunk that really got our hopes up, and when we tested it, it showed better performance dramatically than the nfs solution on a single file, however, it was slower than the first with many small files(as we have in our production environment). Also, Hadoop connect is a modular input, and as such, it doesn't support indexing csv files, so I had to dive into the code and alter it to parse csv files to key-value pairs so they will be indexed. It still showed the same performance difference after the changes I've made.

As of now, my wish is to understand Hadoop Connect's poor performance and enhance it. So if anyone can help me with this, or by giving me another indexing method this will be much appreciated.

Thank you very very much 🙂

rdagan_splunk
Splunk Employee
Splunk Employee

Few ideas:
1) Using Nifi, but with InvokeHttp instead of putSplunk. Here is how to do it: https://www.youtube.com/watch?v=Dq9qKU9HZYM&t=25s
2) Use Hadoop Connect, but mount Hadoop instead of using the normal name node. Here is how to do it: http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/Configuretheapp#Map_to_...

0 Karma

stamstam
Explorer

We've actually tried mounting hadoop and using Hadoop Connect with locally mounted FS, but it still lost to the remote cluster function of Hadoop Connect.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

For me, using MapR distribution, it made a big difference as far as performance.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

🍂 Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...