All Apps and Add-ons

How to import data from Hadoop to Splunk?

eylonronen
Explorer

Hi, I would like anyone to help me find a real decent way of importing data from Hadoop to Splunk.
The methods we have found are the following:
1. Use Nifi to get the files from Hadoop, and then use the put Splunk processor to index them. I really doubt this method, because it seems like a bit overkill for a simple action.
2. Use NFS gateway to mount the hdfs on the forwarder and then use regular monitoring input. This is what we've been doing so far, however, we are looking to replace it because Hadoop's nfs is problematic, and also it is not as fast as we need it to be.
3. Hadoop Connect. A product by Splunk that really got our hopes up, and when we tested it, it showed better performance dramatically than the nfs solution on a single file, however, it was slower than the first with many small files(as we have in our production environment). Also, Hadoop connect is a modular input, and as such, it doesn't support indexing csv files, so I had to dive into the code and alter it to parse csv files to key-value pairs so they will be indexed. It still showed the same performance difference after the changes I've made.

As of now, my wish is to understand Hadoop Connect's poor performance and enhance it. So if anyone can help me with this, or by giving me another indexing method this will be much appreciated.

Thank you very very much 🙂

rdagan_splunk
Splunk Employee
Splunk Employee

Few ideas:
1) Using Nifi, but with InvokeHttp instead of putSplunk. Here is how to do it: https://www.youtube.com/watch?v=Dq9qKU9HZYM&t=25s
2) Use Hadoop Connect, but mount Hadoop instead of using the normal name node. Here is how to do it: http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/Configuretheapp#Map_to_...

0 Karma

stamstam
Explorer

We've actually tried mounting hadoop and using Hadoop Connect with locally mounted FS, but it still lost to the remote cluster function of Hadoop Connect.

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee

For me, using MapR distribution, it made a big difference as far as performance.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...