Reporting

Is anyone pushing splunk data into Hadoop?

Keith
Engager

We're loving splunk for monitoring the health of our system, but we want to also pump the data into hadoop for some larger scale statistical analysis. Has anyone forwarded the splunk data into hadoop or any other mechanism? Another option is to send all data to rsyslog first, and then have hadoop and splunk monitor that. Any thoughts?

Keith

Tags (2)

Ledion_Bitincka
Splunk Employee
Splunk Employee

Yes, it's been over two years that this question has been asked - Splunk Hadoop Connect is a free app that supports bi-directional communication between Splunk and Hadoop. Check it out in the app's section.

jrodman
Splunk Employee
Splunk Employee

If you need to perform arbitrary operations on the dataset, you can extend the search language yourself, and you can arrange for these operations to be performed across your set of indexers.

If you have, however, a problem that you've already got a prefabricated solution built out in hadoop, where it is not particularly useful for the workflow goals to review the result in the splunk ui, then it becomes a simple story about data export, of which there are various paths.

The transforming operator as part of the search language could theoretically pass the data out and back again, but there might be latency thorns there.

The goal, from our end, is that you should be able to get statistical analysis goals on your log data accomplished within the splunk world and search language, as we have a whole relatively focused UI around the problem space, which provides value to you.

cfrln
Explorer

Note that Splunk is itself a horizontally scalable solution for large scale statistical analysis - check out this white paper for more info: http://www.splunk.com/web_assets/pdfs/secure//Splunk_and_MapReduce.pdf

dskillman
Splunk Employee
Splunk Employee

It's kind of a broad question. What data do you want to send to Hadoop? We can simply forward data to another system as it arrives at Splunk or you can export data after it has been cooked by Splunk.

What kind of statistical analysis do you want to do? Have you tried to use Splunk and it didn't suffice? What is the dataset size? How many Splunk servers are you using? Have you looked at Summary Indexing on Splunk?

Lot's of questions in the answer but with a bit more detail we can point you in the right direction.

DJ

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...