Why does custom python script execute twice?

phoenixdigital · ‎03-05-2012

Some background....

I want to have an alert which triggers when the results of two independent searches have data
Search 1 = a simple search which will be contained within the alert/saved-search
Search 2 = will be triggered with the results of search1 using custom searches written with python.

The script will perform the following

Check results passed in (from search 1) and if zero return empty data (not coded yet but easy)
If results are present then authenticate to splunk and perform search 2.
Take results or search 2 and append to search 1 data passed into python script
return data to splunk and alert

Results
When searching from the Splunk web console with the command

sourcetype="holdingRegisters" SPLUNK=StationStatusCoil | dedup station | stationstartcheck __EXECUTE__

By looking at the logfiles the script is run twice by splunk for some reason???

So to my questions

Why is it running twice when performing a search in Splunk?

Note the script is just a skeleton of what I plan to do but the main components are there. It works but just fills the original data with junk. I plan to fill it with the results of the second search.

I am using splunk 4.3 and my custom search is below.

import csv
import sys
import splunk.Intersplunk
import string
import datetime
import splunk.auth, splunk.search
import time

start = time.time()

# open logfile
f = open('/tmp/stationStartCheck.log', 'w+')
f.write(str(time.time() - start ) + ' - Starting\n')
f.write(str(time.time() - start ) + ' - argv length ' + str(len(sys.argv)) + '\n')

(isgetinfo, sys.argv) = splunk.Intersplunk.isGetInfo(sys.argv)

if isgetinfo:
    splunk.Intersplunk.outputInfo(False, False, True, False, None, True)
    # outputInfo automatically calls sys.exit()

# check existing data to determine which station we are dealing with
try:
    f.write(str(time.time() - start ) + ' - Getting results from Splunk\n')
    results = splunk.Intersplunk.readResults(None, None, True)
    f.write(str(time.time() - start ) + ' - Success\n')

    f.write(str(time.time() - start ) + ' - Size of resultset' + str(len(results)) + '\n')

except Exception, e:
    splunk.Intersplunk.generateErrorResults("Unhandled exception:  %s" % (e,))


# perform secondary search and output results if any
try:
    f.write(str(time.time() - start ) + ' - Authenticating....\n')
    key = splunk.auth.getSessionKey('admim','changeme')

    f.write(str(time.time() - start ) + ' - Sending Search....\n')
    my_job = splunk.search.dispatch('search sourcetype="holdingRegisters" SPLUNK=StationStatusCoil | dedup station', namespace='search', earliestTime='-1h', maxEvents=10)

    while not my_job.isDone:
        f.write(str(time.time() - start ) + ' - Waiting for results....\n')
        time.sleep(1)

    f.write(str(time.time() - start ) + ' - Results returned' + str(my_job.resultCount) + '\n')

    for result in my_job.results:
        f.write(str(result['station']) + '\n')


    for i in range(len(results)):
        f.write(str(time.time() - start ) + ' - Adding field to original result set\n')
        results[i]['newField'] = 'uno'

    splunk.Intersplunk.outputResults(results)
    my_job.cancel()

except Exception, e:
    splunk.Intersplunk.generateErrorResults("Unhandled exception:  %s" % (e,))

# close logfile
f.close()

So when performing this search in the Splunk search bar

sourcetype="holdingRegisters" SPLUNK=StationStatusCoil | dedup station | stationstartcheck __EXECUTE__

The logfile has the following output

    tail: /tmp/stationStartCheck.log: file truncated
    Starting
    argv length 2
    0.00281405448914 - Getting results from Splunk
    0.00357985496521 - Success
    0.00358605384827 - Size of resultset4
    0.00359296798706 - Authenticating....
    0.01722407341 - Sending Search....
    0.302284002304 - Waiting for results....
    1.42954897881 - Results returned4
    Station1
    Station2
    Station3
    Station4
    1.47054505348 - Adding field to original result set
    1.47056603432 - Adding field to original result set
    1.4705760479 - Adding field to original result set
    1.47058486938 - Adding field to original result set
    tail: /tmp/stationStartCheck.log: file truncated
    Starting
    argv length 2
    0.000216960906982 - Getting results from Splunk
    0.000913143157959 - Success
    0.000919103622437 - Size of resultset4
    0.00092601776123 - Authenticating....
    0.0146579742432 - Sending Search....
    0.293842077255 - Waiting for results....
    1.30643010139 - Results returned4
            Station1
    Station2
    Station3
    Station4
    1.32303500175 - Adding field to original result set
    1.32305502892 - Adding field to original result set
    1.32306408882 - Adding field to original result set
    1.32308411598 - Adding field to original result set

Any ideas how to stop it running twice

eashwar · ‎03-23-2013

very nice example, i can now understand and able to write some python scripts using splunk skd. thanks phoenixdigital

asingla · ‎03-08-2012

I have the same issue as posted here. What I was able to notice that if you do not use dedup then it works fine. Now I have a case where I need to use dedup command. Post here if you find the solution.

phoenixdigital · ‎03-06-2012

Thankyou for the suggestion but unfortuately that did not work. The script still runs twice.

However on closer inspection at a similar script I noticed that on the first pass it recieved 10 data points. Then on the second pass it recieved 17 data points.

So it appears Splunk it splitting up the results which is a shame as I would prefer my script recieve all data points at once for a given search.

Digging deeper again I can see that Splunk sends through one set of results then as it collects more sends the original set plus more results again. Below is a log of the _time and values of the two batches of data sent to the custom search.

    Size of resultset 10

    Record for _time, RRP 2012-03-07 11:00:00, 19.43165
    Record for _time, RRP 2012-03-07 11:05:00, 19.72373
    Record for _time, RRP 2012-03-07 11:10:00, 20.4553
    Record for _time, RRP 2012-03-07 11:15:00, 20.44109
    Record for _time, RRP 2012-03-07 11:20:00, 20.44642
    Record for _time, RRP 2012-03-07 11:25:00, 20.14813
    Record for _time, RRP 2012-03-07 11:30:00, 19.8667
    Record for _time, RRP 2012-03-07 11:35:00, 19.60739
    Record for _time, RRP 2012-03-07 11:40:00, 19.40553
    Record for _time, RRP 2012-03-07 11:45:00, 19.48035

    Size of resultset 17
    Record for _time, RRP 2012-03-07 10:25:00, 17.72382
    Record for _time, RRP 2012-03-07 10:30:00, 18.189
    Record for _time, RRP 2012-03-07 10:35:00, 17.80982
    Record for _time, RRP 2012-03-07 10:40:00, 18.44075
    Record for _time, RRP 2012-03-07 10:45:00, 18.7983
    Record for _time, RRP 2012-03-07 10:50:00, 19.32571
    Record for _time, RRP 2012-03-07 10:55:00, 19.36478
    Record for _time, RRP 2012-03-07 11:00:00, 19.43165
    Record for _time, RRP 2012-03-07 11:05:00, 19.72373
    Record for _time, RRP 2012-03-07 11:10:00, 20.4553
    Record for _time, RRP 2012-03-07 11:15:00, 20.44109
    Record for _time, RRP 2012-03-07 11:20:00, 20.44642
    Record for _time, RRP 2012-03-07 11:25:00, 20.14813
    Record for _time, RRP 2012-03-07 11:30:00, 19.8667
    Record for _time, RRP 2012-03-07 11:35:00, 19.60739
    Record for _time, RRP 2012-03-07 11:40:00, 19.40553
    Record for _time, RRP 2012-03-07 11:45:00, 19.48035

phoenixdigital · ‎03-06-2012

And looking at the original script it recieves the same set of data on both runs.

    sourcetype="holdingRegisters" SPLUNK=StationStatusCoil | dedup station | stationstartcheck __EXECUTE__

MuS · ‎03-06-2012

Hi phoenixdigital

you can set in commands.conf

supports_getinfo = false
streaming = false

if you need to have supports_getinfo enabled, you can add the following to your script:

(isgetinfo, sys.argv) = splunk.Intersplunk.isGetInfo(sys.argv)
if isgetinfo:
    splunk.Intersplunk.outputInfo(False, False, True, False, None)
else:
    # do your thing

cheers

jbrocks · ‎10-18-2022

Hi, is there an equivalent solution for splunklib?

Why does custom python script execute twice?

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

Your summer travels continue with new course releases

Are you a member of the Splunk Community?

Why does custom python script execute twice?

Why You Can't Miss .conf25: Unleashing the Power of Agentic AI with Splunk & Cisco

Deep Dive into Federated Analytics: Unlocking the Full Power of Your Security Data

Your summer travels continue with new course releases