All Apps and Add-ons

How to import data from Hadoop to Splunk?

eylonronen
Explorer

Hi, I would like anyone to help me find a real decent way of importing data from Hadoop to Splunk.
The methods we have found are the following:
1. Use Nifi to get the files from Hadoop, and then use the put Splunk processor to index them. I really doubt this method, because it seems like a bit overkill for a simple action.
2. Use NFS gateway to mount the hdfs on the forwarder and then use regular monitoring input. This is what we've been doing so far, however, we are looking to replace it because Hadoop's nfs is problematic, and also it is not as fast as we need it to be.
3. Hadoop Connect. A product by Splunk that really got our hopes up, and when we tested it, it showed better performance dramatically than the nfs solution on a single file, however, it was slower than the first with many small files(as we have in our production environment). Also, Hadoop connect is a modular input, and as such, it doesn't support indexing csv files, so I had to dive into the code and alter it to parse csv files to key-value pairs so they will be indexed. It still showed the same performance difference after the changes I've made.

As of now, my wish is to understand Hadoop Connect's poor performance and enhance it. So if anyone can help me with this, or by giving me another indexing method this will be much appreciated.

Thank you very very much 🙂

rdagan_splunk
Splunk Employee
Splunk Employee

Few ideas:
1) Using Nifi, but with InvokeHttp instead of putSplunk. Here is how to do it: https://www.youtube.com/watch?v=Dq9qKU9HZYM&t=25s
2) Use Hadoop Connect, but mount Hadoop instead of using the normal name node. Here is how to do it: http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/Configuretheapp#Map_to_...

0 Karma

stamstam
Explorer

We've actually tried mounting hadoop and using Hadoop Connect with locally mounted FS, but it still lost to the remote cluster function of Hadoop Connect.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

For me, using MapR distribution, it made a big difference as far as performance.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...