How to import data from Hadoop to Splunk?

eylonronen · ‎01-31-2018

Hi, I would like anyone to help me find a real decent way of importing data from Hadoop to Splunk.
The methods we have found are the following:
1. Use Nifi to get the files from Hadoop, and then use the put Splunk processor to index them. I really doubt this method, because it seems like a bit overkill for a simple action.
2. Use NFS gateway to mount the hdfs on the forwarder and then use regular monitoring input. This is what we've been doing so far, however, we are looking to replace it because Hadoop's nfs is problematic, and also it is not as fast as we need it to be.
3. Hadoop Connect. A product by Splunk that really got our hopes up, and when we tested it, it showed better performance dramatically than the nfs solution on a single file, however, it was slower than the first with many small files(as we have in our production environment). Also, Hadoop connect is a modular input, and as such, it doesn't support indexing csv files, so I had to dive into the code and alter it to parse csv files to key-value pairs so they will be indexed. It still showed the same performance difference after the changes I've made.

As of now, my wish is to understand Hadoop Connect's poor performance and enhance it. So if anyone can help me with this, or by giving me another indexing method this will be much appreciated.

Thank you very very much 🙂

rdagan_splunk · ‎01-31-2018

Few ideas:
1) Using Nifi, but with InvokeHttp instead of putSplunk. Here is how to do it: https://www.youtube.com/watch?v=Dq9qKU9HZYM&t=25s
2) Use Hadoop Connect, but mount Hadoop instead of using the normal name node. Here is how to do it: http://docs.splunk.com/Documentation/HadoopConnect/1.2.5/DeployHadoopConnect/Configuretheapp#Map_to_...

stamstam · ‎01-31-2018

We've actually tried mounting hadoop and using Hadoop Connect with locally mounted FS, but it still lost to the remote cluster function of Hadoop Connect.

rdagan_splunk · ‎02-02-2018

For me, using MapR distribution, it made a big difference as far as performance.

How to import data from Hadoop to Splunk?

Almost Too Eventful Assurance: Part 1

Demo Day: Strengthen Your SOC with Splunk Enterprise Security 8.1

Dashboards: Hiding charts while search is being executed and other uses for tokens

Are you a member of the Splunk Community?

How to import data from Hadoop to Splunk?

Almost Too Eventful Assurance: Part 1

Demo Day: Strengthen Your SOC with Splunk Enterprise Security 8.1

Dashboards: Hiding charts while search is being executed and other uses for tokens