I am ingesting a lot of Twitter data for a project, and incidentally, I am ingesting Japanese and Hindi tweets along with the English ones. I do not want to collect these tweets, so is there a way to limit the collection to only English?
Or is there a way to delete the non English Twitter data?
I'm using the Splunk Modular Data inputs for the REST API.
Thanks.
Use a filter! Twitter has a fantastic streaming API which you can use with Splunk. Check out this great tutorial: http://discoveredintelligence.ca/stream-twitter-splunk-10-simple-steps/
Use the language filter in your endpoint (https://dev.twitter.com/streaming/overview/request-parameters#language). For example:
https://stream.twitter.com/1.1/statuses/filter.json?track=twitterapi&language=en
Use a filter! Twitter has a fantastic streaming API which you can use with Splunk. Check out this great tutorial: http://discoveredintelligence.ca/stream-twitter-splunk-10-simple-steps/
Use the language filter in your endpoint (https://dev.twitter.com/streaming/overview/request-parameters#language). For example:
https://stream.twitter.com/1.1/statuses/filter.json?track=twitterapi&language=en