Why does this python search script run twice?

dwfarris · ‎08-04-2014

My background. . . (Heavy Unix, Shell, numerous programming languages. But new to Python and Splunk.)

The intent of this script IS to archive a csv file into a separate directory with a date/time stamp for retention.

Problem is that splunk seems to run twice. First it runs BEFORE "outputcsv" has even started creating the output csv file. Then again, after the file has been created. I can live with it in this script but for future python scripts, This is a problem. I need to understand why my script gets called twice in the following search string.

index=summary | outputcsv myfile | archcsv -c myfile -a temp # Should only run one time at end.

My python search script will look for "myfile.csv" in the /apps/splunk/var/run/splunk and move it to the ../temp folder.

IF there happens to be a myfile.csv in the .../var/run/splunk when the search string STARTS, it will move it FIRST, then the script will be called again when the new myfile.csv has been created.

I know that splunk is NOT unix, but I feel that the "pipe" command should NOT call the archcsv.py script until AFTER outputcsv as finished creating its myfile.csv file.

local commands.conf entry
[pydebug]
type = python
filename = pydebug.py
streaming = false
retainsevents = true

UNIX Directory info with Comments:
[splunk]$ pwd
/apps/links/temp
[splunk]$ ls -ltr

[splunk]$ ls -altr /apps/splunk/var/run/splunk/csvstuff*
-rw------- 1 splunk users 12734095 Aug 4 13:08 /apps/splunk/var/run/splunk/csvstuff.csv

[splunk]$ # Now I will run the search, outputcsv and archive utility.
[splunk]$ # For some reason, it will copy the Existing csvstuff.csv and then the new one.
[splunk]$ pwd
/apps/links/temp
[splunk]$ ls -altr
total 22596
drwxr-xr-x 3 splunk users 4096 Jul 31 15:58 ..
-rw-r--r-- 1 splunk users 12734095 Aug 4 13:08 csvstuff_20140804131017.csv
-rw-r--r-- 1 splunk users 10392108 Aug 4 13:10 csvstuff_20140804131021.csv
drwxr-xr-x 2 splunk users 4096 Aug 4 13:10 .

Python script
#!/usr/bin/python

import sys, getopt, os
import splunk.Intersplunk

results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

def main(argv):
line = ''

aarg=0
carg=0
archfold = 'subdir'
csvfile = 'default.csv'

options, remainder = getopt.getopt(sys.argv[1:], 'c:a:', ['csvfile=',
'archfold='])

for opt, arg in options:
if opt in ('-c', '--csvfile'):
carg=1
csvfile = arg
elif opt in ('-a', '--archfold'):
aarg=1
archfold = arg

sdir='/apps/splunk/var/run/splunk/'
adir='/apps/links/' + archfold + '/'
sfile=sdir + csvfile + '.csv'
afile=adir + csvfile + '_date +"%Y%m%d%H%M%S".csv'

if carg == 0 or aarg == 0:
sys.exit(1)

move='mv ' + sfile + ' ' + afile
line='if [ -e ' + sfile + ' ]; then ' + move + '; fi'
os.system(line)
line='chmod 644 ' + afile
os.system(line)

newresults = []
oldresult = None
for result in results:
if result != oldresult:
newresults.append(result)
oldresult = result

splunk.Intersplunk.outputResults(newresults)

if name == "main":
main(sys.argv[1:])

[splunk]$ # now, notice the first file above is from BEFORE I ran the search command

dominiquevocat · ‎01-11-2016

I do have the same issue with outputcsv that seems to stream the results. I have not yet tried it but... maybe if you run the proper search as subsearch ( in [] ) and have the own backup command run in the main search this issue is mitigated because the subsearch is run completely first?

Why does this python search script run twice?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

Why does this python search script run twice?

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...