Python oneshot query to search for many values via...

drinkingjimmy · ‎05-20-2016

I'm trying to run a search where I will get results if a field matches one of many predetermined values and I'm worried about the logistics and resources in processing a large number of OR clauses.

I'm building an external app that is making calls to Splunk through the Python SDK, and I've found searching for a few expressions is pretty basic:

kwargs_oneshot = {"earliest_time": "-1h","latest_time": "now"}
searchquery_oneshot = "search index=foo sourcetype=bar (url=google* OR url=yahoo* OR url=splunk*)"
oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)
reader = results.ResultsReader(oneshotsearch_results)

But what to do if the external data source that I want to put in my script provides me with 1000+ urls?

I think a lookup table is probably the best route, but I can't seem to find any way to programmatically upload it as part of the oneshot, and the list of URLs will likely be different each time I execute the script from my application which resides on a different server. I don't want to have to ssh into my Splunk server before I run the job, especially since the external app will run autonomously.

Is there a way to upload a file as part of the oneshot API call, so I can just do a lookup against the index?

I guess a more relevant question is probably should I bother even? Would a set of 1000 OR clauses actually run faster than the lookup?

jkat54 · ‎05-23-2016

You cant upload the csv and perform a search on it in one call to the api.

Here's how I would do it. Using a single python script, programmatically do the following:

create csv with url data... with 1 column:

url <-- column header will become field name in splunk
url1
url2
url3
configure the sourcetype props.conf

[urlCSV]
INDEXED_EXTRACTIONS = csv
DATETIME_CONFIG = CURRENT
POST the csv into an index named URLINDEX using oneshot input

http://docs.splunk.com/Documentation/Splunk/6.4.1/RESTREF/RESTinput#POST_data.2Finputs.2Foneshot_met...
Use the URL data as a subsearch to your original search:

searchquery_oneshot = "search index=foo sourcetype=bar [search index=URLINDEX index_earliest=-15m | dedup url | fields url]"

use index_earliest in the subsearch to only get the latest url data that was just uploaded in steps 1-3.

drinkingjimmy · ‎05-23-2016

Your method would definitely work, but it the comparison file may change on each run.

I'd like to find a way to do it programmatically, so I don't have to log in separately to splunk to modify the inputfile and props.conf every time the job runs (which will likely be hourly).

jkat54 · ‎05-23-2016

Thats why the python script creates the csv, uploads the data, and then runs the search each time. You will not need to change props.conf every time... just once, and by POSTing the csv file to the oneshot endpoint, you'll be uploading the comparison data each time. Finally, using index_earliest=-15m or even index_earliest=-5m in your subsearch, you're guaranteeing that you'll only see the data from the latest csv.

You would not need to log in to splunk separately.

the only issue with this is the fact that you will take the extra license hit. Thousands of URLs in each csv = ... maybe 10 - 20mb each run??? not sure.

jkat54 · ‎05-23-2016

You other method is dropping the csv on the search heads and using it as a lookup, but that seems far to complex. Finally... having thousands of ORs shouldnt be an issue except for maybe there will be a problem posting that long of a string.

Python oneshot query to search for many values via API

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

Best Practices: Splunk auto adjust pipeline queue

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation