Python SDK : How to retrieve search results by sav...

vickypandya · ‎08-02-2012

Hi folks,
I am new to python and splunk. I have been trying to get saved search results via splunk sdk python. I have tried using job.py(example in sdk) which outputs sid for all the search jobs which can be used to find search name and then use the sid to get the results.

I have also tried to do GET to Services/search/jobs and I get a list of all the jobs which is in turn is tons of data to parse the xml output to get desired search name.

Are there any other approaches to get the saved search results by search name rather than Search ID ? if not what are the available options through sdk route ?

Any help is much appreciated.

Thanks

sklass · ‎07-20-2016

Here is a loose example on how to do this.

    search_params = {'name': "Some lame search",
                     'search': "<FILL ME IN>",
                     'dispatch.ttl': 60 * 60 * 24 * 7 }

    search_params_update = {
        'description': 'Some description',
        'is_scheduled': True,
        'cron_schedule': '0 1 * * *',      # Daily at 1am
        'schedule_window': 120,
    }

    credentials = SplunkAuth._asdict()
    service = client.connect(**credentials)

    try:
        saved_search = service.saved_searches.create(**search_params)
    except HTTPError as err:
        if "A saved search with that name already exists." not in "{}".format(err):
            log.warning("Unable to set off search - {}".format(" :: ".join("{}".format(err).split("\n"))))
            raise
        else:
            saved_search = service.saved_searches[search_params.get('name')]
            update_required = False
            for k, v in search_params_update.items():
                if saved_search.content.get(k) != v:
                    update_required = True
                    break
            if update_required:
                saved_search.update(**search_params_update).refresh()
    else:
        saved_search.update(**search_params_update).refresh()

    # Do we have a job that is ready to go..
    job_data = json.load(service.jobs.get(output_mode='json').get('body'))
    completed_jobs = [x for x in job_data.get('entry') if x.get('content', {}).get('label') == search_params['name']
                      and x.get('content', {}).get('isDone')]
    try:
        latest = completed_jobs[0]
        last_update = datetime.datetime.strptime(latest.get('published').rpartition("-")[0], "%Y-%m-%dT%H:%M:%S.%f")
        if (datetime.datetime.now() - last_update).total_seconds() > 60 * 60 * 12:
            log.info("Launching new job it's pretty old. {}".format(last_update))
            saved_search.dispatch()
        log.info("Getting latest completed job {}".format(latest.get('updated')))
        job = service.jobs[latest.get('content').get('sid')]
    except KeyError:
        # What do we have in progress.
        in_process_jobs = [x for x in job_data.get('entry') if
                          x.get('content', {}).get('label') == search_params['name']
                          and not x.get('content', {}).get('isDone')]
        if not in_process_jobs:
            saved_search.dispatch()
            log.info("New Job has been dispatched")
            return {'message': "Job has been dispatched"}
        else:
            in_process_job = in_process_jobs[-1]
            log.info("Job previously dispatched and is at {:.2%}".format(
                in_process_job.get('content', {}).get('doneProgress')))
            return {'message': "Job previously dispatched and is at {:.2%}".format(
                in_process_job.get('content', {}).get('doneProgress'))}

hexx · ‎08-07-2012

To add to Andrea's answer, search results can only be retrieved by referencing the search ID of your search from the /services/jobs/{search_id} endpoint and its sub-nodes such as /services/search/jobs/{search_id}/results.

For more detailed information, take a look at the endpoints listed for /services/search/jobs.

You should be able to achieve this goal with this sort of pseudo-code:

List all search jobs with a GET against /services/search/jobs/
Identify the search jobs that match the saved search name that you are looking for (isSaved=1 AND label={saved search name})
Pick the most recent search job for your saved search. It will be the one with most recent epoch time embedded in its search ID. Example: sid=admin__admin__search_dGVzdCA0_1343881451.4909
Use that SID to access the results of your search with a GET against /services/search//jobs/{search_id}/results

Note that these tasks can be made easier by using one of our SDKs such as the Python SDK.

You'll probably want to read more about the "job" and "jobs"" classes along with their methods in the Python SDK reference documentation:

apruneda_splunk · ‎08-02-2012

Check out the topic: "How to search your data using the Python SDK".

There are code examples that show how to run a saved search and see the results, and how to list your search jobs and get those results. The beginning of the topic explains the difference between a saved search and a search job.

However, for a job, the SID is very important. You could have many jobs resulting from one saved search, so the name of the saved search is not a unique identifier. But if you want to see the names of the search for each search job, you could modify the code sample for listing the search jobs (which lists each job.sid) and have it display the job's name (job.name).

Python SDK : How to retrieve search results by saved search name?

Aligning Observability Costs with Business Value: Practical Strategies

Mastering Data Pipelines: Unlocking Value with Splunk

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0

Are you a member of the Splunk Community?

Python SDK : How to retrieve search results by saved search name?

Aligning Observability Costs with Business Value: Practical Strategies

Mastering Data Pipelines: Unlocking Value with Splunk

Splunk Up Your Game: Why It's Time to Embrace Python 3.9+ and OpenSSL 3.0