Has anyone used Splunk to input data from a company called Datasift? It's data from social media sites. My understanding of the way it works is its a web service which requires a key and a set of parameters and the results come back in json format. I'm not really familiar with json. Can someone give me some advice on how to handle this input?
As Michael has mentioned, if you are using Splunk Storm, DataSift provides an out the box "push" destination for delivering you data directly. To set it up, login to datasift.com and then visit http://datasift.com/destination/splunkstorm to configure the endpoint. There are full instructions here.
Once the data is successfully flowing to Splunk Storm, you can then create searches and dashboards. If you pipe to spath, you can then view all of the JSON indexes to analyse the data. For example, to see a list of the top klout topics you could use:
* | spath | top data.klout.topics{}
We're going to make a really cool modular input (hopefully with the query builder build in to it)-- how cool would that be 🙂
In the mean time we do have some streaming HTTP stuff that will connect via an API that should work.
Also, if you're using Splunk Storm, there is already a Data Destination that Nick (Halstead) and the boys at DataSift have built.
But ya know.. if you have linux, you can just use "curl", write it to a file and Splunk the live file in the mean time. Need help?
Thanks Michael, any ETA on the modular input? In the meantime i'll go the curl route, I can probably handle that.
Hi - I run sales for DataSift in Europe and whilst I can't help directly, I know someone who can! One of my colleagues did some work on exactly this prior to Christmas.
He's on holiday right now, but I'll link him into this discussion and hopefully he can provide some guidance.
Hi Toby, thanks for the response. Look forward to hearing back about it. I've made some headway on using the datasift python class but haven't quite been able to get it into splunk yet.
Thanks!
Some more info. Datasift has a python class at github. http://datasift.github.com/datasift-python/. Would be super cool to see a modular input created for this. I would do it myself but don't quite have the python skills.
Hi Micheal, I'm looking for help on getting the data in from datasift.
Do you already have the data in Splunk, or are you needing help getting data (in realtime) from DataSift. We do have some answers depending on your scenario.