Solved: Re: Splunk GUI Rest API Twitter Integration - Page 2

aaronkorn · ‎10-11-2013

Hello,

We are looking use the Splunk REST gui to connect to Twitter and monitor feeds based on several URL parameters we care to search for.

We have the end point defined as https://api.twitter.com/1.1/search/tweets.json, have our authentication credentials entered, and for a sample URL Argument as q=UPMC to search twitter anything for UPMC returning in XML format. There is no data returning though but when I use this twitter dev app it works fine: https://dev.twitter.com/console. Anyone else having issue using the GUI Rest integration or have a better way to pull in twitter data based on keywords? Should we worry about defining Response Handler and other options in the config?

Thanks!

Damien_Dallimor · ‎10-16-2013

Well, you will be getting multiple events in the response document , but they are being indexed in Splunk as 1 single event. That is why the REST API Modular Input has Custom Response Handlers that you can plug in to parse the specific response you are getting back ie: split out the individual twitter events from the JSON response.
You add your custom response handler to bin/responsehandlers.py and declare it on the setup page for your REST Input Definition

Here is an example of what a custom handler might look like for the Twitter JSON response :

class TwitterEventHandler:

    def __init__(self,**args):
        pass

    def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):       

        if response_type == "json":        
            output = json.loads(raw_response_output)
            last_tweet_indexed_id = 0
            for twitter_event in output["statuses"]:
                print_xml_stream(json.dumps(twitter_event))
                if "id_str" in twitter_event:
                    tweet_id = twitter_event["id_str"]
                    if tweet_id > last_tweet_indexed_id:
                        last_tweet_indexed_id = tweet_id

            if not "params" in req_args:
                req_args["params"] = {}

            req_args["params"]["since_id"] = last_tweet_indexed_id

        else:
            print_xml_stream(raw_response_output)

I see that the raw response back from twitter also has a created_at field for each event , which you can then use as your Splunk index time value.

View solution in original post

Damien_Dallimor · ‎10-22-2013

Nice. Updated the code example.

aaronkorn · ‎10-22-2013

id_str instead of id

aaronkorn · ‎10-22-2013

Awesome! Just had to modify it slightly for the different id. Thanks for all your help

Damien_Dallimor · ‎10-21-2013

The REST Mod Input is generic , it can be used in an unknown number of scenarios , so I have to provide an extension mechanism for handling specific behaviours. This is the purpose of responsehandlers.py.So custom handling can also extend beyond output formatting to also being dynamically calculating URL arguments to add to the request.Such as the "since_id" argument which you need to calculate based on the latest tweet id that you processed.1 suggested 2 ways of performing this above , simple and more advanced.Updated above untested code snippet to show how this might potentially be done.

aaronkorn · ‎10-21-2013

Ok. Where should we start in the responsehandlers.py script? I guess I'm not fully understanding the purpose of the responsehandlers.py file and how we would go about passing this to the inputs.conf file.

Damien_Dallimor · ‎10-19-2013

Maintain a variable in responsehandlers.py that stores the last tweet id , and then use this as the since_id for your next request.And iteratively repeat this.

More advanced , but you could potentially also use the Splunk Python SDK from the response handler to execute a Splunk search and ask it for the latest tweet id that you indexed to use as your since_id.

Also , you can update your REST stanza "url_args" using the Python SDK , so if you needed to persist the since_id value back into you configuration (ie: to survive restarts), then you can do this also.

aaronkorn · ‎10-19-2013

Thanks! This seems to be working pretty well now but we seem to be ingesting duplicate tweets every time it executes the API call. Is there anyway around this? I know in there are several different parameters to pass though in the request (can be found here: https://dev.twitter.com/docs/api/1.1/get/search/tweets). I imagine that we would want to use the since_id but how would you update this value in the call based off the last ingested event?

aaronkorn · ‎10-14-2013

Thanks, I did that and it is only returning 1 event but when I search twitter.com/search for that query (UPMC) I get multiple results back. What would be the best way to leverage a real time stream for the keywords or only search for the most recent tweets?

Damien_Dallimor · ‎10-13-2013

Looks ok , I have the same setup in my twitter test stanza and it works..what search are you using ? try searching over "all time" for "index=twitter sourcetype=twitter"

polifagbonanza · ‎10-13-2013

One of the issues may be that you have response_type set to xml, albeit you're pulling json data.

aaronkorn · ‎10-13-2013

Thanks for your response. Heres what we have so far:

[rest://Twitter]
auth_type = oauth1
endpoint = https://api.twitter.com/1.1/search/tweets.json
http_method = GET
index = twitter
index_error_response_codes = 1
oauth1_access_token = 1951974925-0Gmoi6JxxToMG4P7lEWX03xxxxxxxxxxxxxx
oauth1_access_token_secret = sYDyjNRz71Q0Wbbeni0RbuBoIQmUxxxxxxxxxxxxx
oauth1_client_key = vpIKhXBmLmqxxxxxxxxxxx
oauth1_client_secret = 0Vrp1WeP7g8NGewlTx2pMKcxxxxxxxxxxxxx
response_type = xml
sourcetype = twitter
streaming_request = 0
url_args = q=UPMC
polling_interval = 10
response_handler_args =

Damien_Dallimor · ‎10-11-2013

Can you post your inputs.conf stanza for the Twitter Rest Input that is not working for you.

Splunk GUI Rest API Twitter Integration

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

.conf24 | Session Scheduler is Live!!

Introducing the Splunk Community Dashboard Challenge!