Splunk Search

Is it possible to pass CSV Reader Object As Argument From Splunk Python to another Python Script ???

Path Finder

I have following two python scripts -namelookupWrapper.py -namelookup.py

The namelookupWrapper.py takes input of "memberId", "memberName" from Splunk Web interface and has following code snippet

idf = sys.argv[1]
namef = sys.argv[2]
real_script = "C:\\Splunk\\etc\\apps\\search\\bin\\namelookup.py"
r = csv.reader(sys.stdin)
os.execv(python_executable, [ python_executable, real_script ] + sys.argv[1:] )

Wondering how would i pass csv reader object "r" as an argument using os.execv() to another python script i.e. namelookup.py

Using the csv reader object i iterate thru the input in namelookupWrapper.py which looks like as shown below

[MemberId, MemberName] 
[123,       ] 
[456,       ] 
[989,       ] 

Now i have another script i.e. namelookup.py running under Python 2.7 using pyodbc to retrieve Member Names from database for a given Member Id in namelooupWrapper.py

Please note the reason i need to do so is Splunk Python 2.6 host namelookupWrapper.py and the real_script namelookup.py is under Python 2.7 has pyodbc connection

0 Karma

Splunk Employee
Splunk Employee

Generally speaking, no.

This is more of a python / operating sytsem question than Splunk (not that I mind it) so you may get smarter / cleverer answers in python circles, but here are some of the pieces I know.

The csv reader is a python library construct, so exists in the program, and has its state stored in the local program memory. The operating system has no idea that it exists. The operating system does know about files, but not about code interpreting them.

A separate process will not have access to any information from the first process. There's a fuzzy bit to this, since on unix you can of course use fork() to copy a process and almost all of its state, but exec tosses most of the state out, and windows doesn't support the fork behavior anyway.

The combination of these two means the second program will have to set up a csv reader again.

However, the concept of the open standard-in/stdin datastream is, as far as I know, passed along to the launched subprocess. So you should be able to just set up the csvreader inside that script.

If this is more of a problem of wanting to feed this data from a splunk-specific script to some other script that doesn't know that it will be getting csv, then you've got a relatively tricky little programming problem, where you'll have to have the csv reader happen in one process, which launches and performs some form of IPC to the other process in a manner that process is expecting, whether that's a named pipe, object remoting, or some other scheme.

As for the stated problem: we run scripts with splunk python, but your python has pyodbc set up. I think you just skip this question, and set up the csvreader inside your namelookup.py inner script. The csv reader is a geneneric python module, and not specific to splunk, so will work just fine in both installs.

The only special caveats for use of csv reader are:

  • be sure to set up the csvreader to handle the header, which will always be present. Since some csv does not have headers, python doesn't do this automatically.
  • You should probably give python a hint that it's OK if the rows / fields are large. Splunk's unstructured data can sometimes be large (big event comes through) and by default python just tosses an exception if this happens. I configured my csv reader to allow 10MB fields, when doing this.

Path Finder

jrodman i would greatly appreciate if you could provide an example for your suggestion on tempfile as i am new to Python

0 Karma

Splunk Employee
Splunk Employee

The quickest and dirtiest thing is probably to read sys.stdin, push it to a tempfile, and then run your script with that tempfile as an argument. Sucks but you can probably get a proof of concept going.

0 Karma

Splunk Employee
Splunk Employee

Yes, but according to my understanding the stdin of a subprocess should match the stdin of the launching process. Just be sure not to read the data in the launching process, and it should be available.

We do do some tricks to arrange stdin to be a temporary file, but I assumed those were outside the process, not a python behavior.

In the worst case you could use some nasty arrangement with running the other script via os.subprocess and communicating over a pipe.

It's also windows differs regarding execv.

0 Karma

Path Finder

jrodman thanks for wonderful explaination but having csvreader inside namelookup.py script does nothing because its namelookupWrapper.py which gets the input from Splunk as configured in transforms.conf. The reason for wrapper script is to bridge the Splunk Python version gap i.e. Python 2.6 doesnt support pyodbc on Windows 64 bit OS hence i have system installation of Python 2.7 which host namelookup.py script. So the challenge is how to connect these two scripts to make the external lookup working

0 Karma