All Apps and Add-ons

Splunk GUI Rest API Twitter Integration

aaronkorn
Splunk Employee
Splunk Employee

Hello,

We are looking use the Splunk REST gui to connect to Twitter and monitor feeds based on several URL parameters we care to search for.

We have the end point defined as https://api.twitter.com/1.1/search/tweets.json, have our authentication credentials entered, and for a sample URL Argument as q=UPMC to search twitter anything for UPMC returning in XML format. There is no data returning though but when I use this twitter dev app it works fine: https://dev.twitter.com/console. Anyone else having issue using the GUI Rest integration or have a better way to pull in twitter data based on keywords? Should we worry about defining Response Handler and other options in the config?

Thanks!

0 Karma
1 Solution

Damien_Dallimor
Ultra Champion

Well, you will be getting multiple events in the response document , but they are being indexed in Splunk as 1 single event. That is why the REST API Modular Input has Custom Response Handlers that you can plug in to parse the specific response you are getting back ie: split out the individual twitter events from the JSON response.
You add your custom response handler to bin/responsehandlers.py and declare it on the setup page for your REST Input Definition

Here is an example of what a custom handler might look like for the Twitter JSON response :

class TwitterEventHandler:

    def __init__(self,**args):
        pass

    def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):       

        if response_type == "json":        
            output = json.loads(raw_response_output)
            last_tweet_indexed_id = 0
            for twitter_event in output["statuses"]:
                print_xml_stream(json.dumps(twitter_event))
                if "id_str" in twitter_event:
                    tweet_id = twitter_event["id_str"]
                    if tweet_id > last_tweet_indexed_id:
                        last_tweet_indexed_id = tweet_id

            if not "params" in req_args:
                req_args["params"] = {}

            req_args["params"]["since_id"] = last_tweet_indexed_id

        else:
            print_xml_stream(raw_response_output)

I see that the raw response back from twitter also has a created_at field for each event , which you can then use as your Splunk index time value.

View solution in original post

Damien_Dallimor
Ultra Champion

Nice. Updated the code example.

0 Karma

aaronkorn
Splunk Employee
Splunk Employee

id_str instead of id

0 Karma

aaronkorn
Splunk Employee
Splunk Employee

Awesome! Just had to modify it slightly for the different id. Thanks for all your help

0 Karma

Damien_Dallimor
Ultra Champion

The REST Mod Input is generic , it can be used in an unknown number of scenarios , so I have to provide an extension mechanism for handling specific behaviours. This is the purpose of responsehandlers.py.So custom handling can also extend beyond output formatting to also being dynamically calculating URL arguments to add to the request.Such as the "since_id" argument which you need to calculate based on the latest tweet id that you processed.1 suggested 2 ways of performing this above , simple and more advanced.Updated above untested code snippet to show how this might potentially be done.

0 Karma

aaronkorn
Splunk Employee
Splunk Employee

Ok. Where should we start in the responsehandlers.py script? I guess I'm not fully understanding the purpose of the responsehandlers.py file and how we would go about passing this to the inputs.conf file.

0 Karma

Damien_Dallimor
Ultra Champion

Maintain a variable in responsehandlers.py that stores the last tweet id , and then use this as the since_id for your next request.And iteratively repeat this.

More advanced , but you could potentially also use the Splunk Python SDK from the response handler to execute a Splunk search and ask it for the latest tweet id that you indexed to use as your since_id.

Also , you can update your REST stanza "url_args" using the Python SDK , so if you needed to persist the since_id value back into you configuration (ie: to survive restarts), then you can do this also.

0 Karma

aaronkorn
Splunk Employee
Splunk Employee

Thanks! This seems to be working pretty well now but we seem to be ingesting duplicate tweets every time it executes the API call. Is there anyway around this? I know in there are several different parameters to pass though in the request (can be found here: https://dev.twitter.com/docs/api/1.1/get/search/tweets). I imagine that we would want to use the since_id but how would you update this value in the call based off the last ingested event?

0 Karma

aaronkorn
Splunk Employee
Splunk Employee

Thanks, I did that and it is only returning 1 event but when I search twitter.com/search for that query (UPMC) I get multiple results back. What would be the best way to leverage a real time stream for the keywords or only search for the most recent tweets?

0 Karma

Damien_Dallimor
Ultra Champion

Looks ok , I have the same setup in my twitter test stanza and it works..what search are you using ? try searching over "all time" for "index=twitter sourcetype=twitter"

0 Karma

polifagbonanza
New Member

One of the issues may be that you have response_type set to xml, albeit you're pulling json data.

0 Karma

aaronkorn
Splunk Employee
Splunk Employee

Thanks for your response. Heres what we have so far:

[rest://Twitter]
auth_type = oauth1
endpoint = https://api.twitter.com/1.1/search/tweets.json
http_method = GET
index = twitter
index_error_response_codes = 1
oauth1_access_token = 1951974925-0Gmoi6JxxToMG4P7lEWX03xxxxxxxxxxxxxx
oauth1_access_token_secret = sYDyjNRz71Q0Wbbeni0RbuBoIQmUxxxxxxxxxxxxx
oauth1_client_key = vpIKhXBmLmqxxxxxxxxxxx
oauth1_client_secret = 0Vrp1WeP7g8NGewlTx2pMKcxxxxxxxxxxxxx
response_type = xml
sourcetype = twitter
streaming_request = 0
url_args = q=UPMC
polling_interval = 10
response_handler_args =

0 Karma

Damien_Dallimor
Ultra Champion

Can you post your inputs.conf stanza for the Twitter Rest Input that is not working for you.

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...