Splunk Search

Why does this python search script run twice?

dwfarris
Explorer

My background. . . (Heavy Unix, Shell, numerous programming languages. But new to Python and Splunk.)

The intent of this script IS to archive a csv file into a separate directory with a date/time stamp for retention.

Problem is that splunk seems to run twice. First it runs BEFORE "outputcsv" has even started creating the output csv file. Then again, after the file has been created. I can live with it in this script but for future python scripts, This is a problem. I need to understand why my script gets called twice in the following search string.

index=summary | outputcsv myfile | archcsv -c myfile -a temp # Should only run one time at end.

My python search script will look for "myfile.csv" in the /apps/splunk/var/run/splunk and move it to the ../temp folder.

IF there happens to be a myfile.csv in the .../var/run/splunk when the search string STARTS, it will move it FIRST, then the script will be called again when the new myfile.csv has been created.

I know that splunk is NOT unix, but I feel that the "pipe" command should NOT call the archcsv.py script until AFTER outputcsv as finished creating its myfile.csv file.


local commands.conf entry
[pydebug]
type = python
filename = pydebug.py
streaming = false
retainsevents = true

UNIX Directory info with Comments:
[splunk]$ pwd
/apps/links/temp
[splunk]$ ls -ltr

[splunk]$ ls -altr /apps/splunk/var/run/splunk/csvstuff*
-rw------- 1 splunk users 12734095 Aug 4 13:08 /apps/splunk/var/run/splunk/csvstuff.csv

[splunk]$ # Now I will run the search, outputcsv and archive utility.
[splunk]$ # For some reason, it will copy the Existing csvstuff.csv and then the new one.
[splunk]$ pwd
/apps/links/temp
[splunk]$ ls -altr
total 22596
drwxr-xr-x 3 splunk users 4096 Jul 31 15:58 ..
-rw-r--r-- 1 splunk users 12734095 Aug 4 13:08 csvstuff_20140804131017.csv
-rw-r--r-- 1 splunk users 10392108 Aug 4 13:10 csvstuff_20140804131021.csv
drwxr-xr-x 2 splunk users 4096 Aug 4 13:10 .


Python script
#!/usr/bin/python

import sys, getopt, os
import splunk.Intersplunk

results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

def main(argv):
line = ''

aarg=0
carg=0
archfold = 'subdir'
csvfile = 'default.csv'

options, remainder = getopt.getopt(sys.argv[1:], 'c:a:', ['csvfile=',
'archfold='])

for opt, arg in options:
if opt in ('-c', '--csvfile'):
carg=1
csvfile = arg
elif opt in ('-a', '--archfold'):
aarg=1
archfold = arg

sdir='/apps/splunk/var/run/splunk/'
adir='/apps/links/' + archfold + '/'
sfile=sdir + csvfile + '.csv'
afile=adir + csvfile + '_date +"%Y%m%d%H%M%S".csv'

if carg == 0 or aarg == 0:
sys.exit(1)

move='mv ' + sfile + ' ' + afile
line='if [ -e ' + sfile + ' ]; then ' + move + '; fi'
os.system(line)
line='chmod 644 ' + afile
os.system(line)

newresults = []
oldresult = None
for result in results:
if result != oldresult:
newresults.append(result)
oldresult = result

splunk.Intersplunk.outputResults(newresults)

if name == "main":
main(sys.argv[1:])

[splunk]$ # now, notice the first file above is from BEFORE I ran the search command

Tags (2)

dominiquevocat
SplunkTrust
SplunkTrust

I do have the same issue with outputcsv that seems to stream the results. I have not yet tried it but... maybe if you run the proper search as subsearch ( in [] ) and have the own backup command run in the main search this issue is mitigated because the subsearch is run completely first?

0 Karma
Get Updates on the Splunk Community!

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...

Index This | What goes away as soon as you talk about it?

May 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this month’s ...