We're loving splunk for monitoring the health of our system, but we want to also pump the data into hadoop for some larger scale statistical analysis. Has anyone forwarded the splunk data into hadoop or any other mechanism? Another option is to send all data to rsyslog first, and then have hadoop and splunk monitor that. Any thoughts?
Yes, it's been over two years that this question has been asked - Splunk Hadoop Connect is a free app that supports bi-directional communication between Splunk and Hadoop. Check it out in the app's section.
If you need to perform arbitrary operations on the dataset, you can extend the search language yourself, and you can arrange for these operations to be performed across your set of indexers.
If you have, however, a problem that you've already got a prefabricated solution built out in hadoop, where it is not particularly useful for the workflow goals to review the result in the splunk ui, then it becomes a simple story about data export, of which there are various paths.
The transforming operator as part of the search language could theoretically pass the data out and back again, but there might be latency thorns there.
The goal, from our end, is that you should be able to get statistical analysis goals on your log data accomplished within the splunk world and search language, as we have a whole relatively focused UI around the problem space, which provides value to you.
Note that Splunk is itself a horizontally scalable solution for large scale statistical analysis - check out this white paper for more info: http://www.splunk.com/web_assets/pdfs/secure//Splunk_and_MapReduce.pdf
It's kind of a broad question. What data do you want to send to Hadoop? We can simply forward data to another system as it arrives at Splunk or you can export data after it has been cooked by Splunk.
What kind of statistical analysis do you want to do? Have you tried to use Splunk and it didn't suffice? What is the dataset size? How many Splunk servers are you using? Have you looked at Summary Indexing on Splunk?
Lot's of questions in the answer but with a bit more detail we can point you in the right direction.