Splunk Search

Python oneshot query to search for many values via API

Explorer

I'm trying to run a search where I will get results if a field matches one of many predetermined values and I'm worried about the logistics and resources in processing a large number of OR clauses.

I'm building an external app that is making calls to Splunk through the Python SDK, and I've found searching for a few expressions is pretty basic:

kwargs_oneshot = {"earliest_time": "-1h","latest_time": "now"}
searchquery_oneshot = "search index=foo sourcetype=bar (url=google* OR url=yahoo* OR url=splunk*)"
oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)
reader = results.ResultsReader(oneshotsearch_results)

But what to do if the external data source that I want to put in my script provides me with 1000+ urls?

I think a lookup table is probably the best route, but I can't seem to find any way to programmatically upload it as part of the oneshot, and the list of URLs will likely be different each time I execute the script from my application which resides on a different server. I don't want to have to ssh into my Splunk server before I run the job, especially since the external app will run autonomously.

Is there a way to upload a file as part of the oneshot API call, so I can just do a lookup against the index?

I guess a more relevant question is probably should I bother even? Would a set of 1000 OR clauses actually run faster than the lookup?

0 Karma

SplunkTrust
SplunkTrust

You cant upload the csv and perform a search on it in one call to the api.

Here's how I would do it. Using a single python script, programmatically do the following:

  1. create csv with url data... with 1 column:

    url <-- column header will become field name in splunk
    url1
    url2
    url3

  2. configure the sourcetype props.conf

    [urlCSV]
    INDEXED_EXTRACTIONS = csv
    DATETIME_CONFIG = CURRENT

  3. POST the csv into an index named URLINDEX using oneshot input

    http://docs.splunk.com/Documentation/Splunk/6.4.1/RESTREF/RESTinput#POST_data.2Finputs.2Foneshot_met...

  4. Use the URL data as a subsearch to your original search:

    searchquery_oneshot = "search index=foo sourcetype=bar [search index=URLINDEX index_earliest=-15m | dedup url | fields url]"

use index_earliest in the subsearch to only get the latest url data that was just uploaded in steps 1-3.

0 Karma

Explorer

Your method would definitely work, but it the comparison file may change on each run.

I'd like to find a way to do it programmatically, so I don't have to log in separately to splunk to modify the inputfile and props.conf every time the job runs (which will likely be hourly).

0 Karma

SplunkTrust
SplunkTrust

Thats why the python script creates the csv, uploads the data, and then runs the search each time. You will not need to change props.conf every time... just once, and by POSTing the csv file to the oneshot endpoint, you'll be uploading the comparison data each time. Finally, using index_earliest=-15m or even index_earliest=-5m in your subsearch, you're guaranteeing that you'll only see the data from the latest csv.

You would not need to log in to splunk separately.

the only issue with this is the fact that you will take the extra license hit. Thousands of URLs in each csv = ... maybe 10 - 20mb each run??? not sure.

0 Karma

SplunkTrust
SplunkTrust

You other method is dropping the csv on the search heads and using it as a lookup, but that seems far to complex. Finally... having thousands of ORs shouldnt be an issue except for maybe there will be a problem posting that long of a string.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!