Below is the props.conf at $SPLUNK_HOME/etc/system/local:
[SPLUNK_SERVICE_Log]
lookup_table = namelookup Id OUTPUT Name
Below is the transforms.conf at $SPLUNK_HOME/etc/system/local:
[namelookup]
external_cmd = namelookup.py Id Name
external_type = python
fields_list = Id, Name
Script location :
$SPLUNK_HOME/etc/system/bin/namelookup.py
# File namelookup.py
# ------------------------------
import os,csv
#import pyodbc
import sys
import logging
import logging.config
def main():
#if len(sys.argv) != 3:
#print "Usage: python name_lookup.py [id field] [name field]"
#sys.exit(0)
logging.config.fileConfig("logging.conf")
# create logger
logger = logging.getLogger("namelookup")
# "application" code
logger.debug("====Inside Main=====")
idf = sys.argv[1]
namef = sys.argv[2]
r = csv.reader(sys.stdin)
w = None
header = []
first = True
d1 = {}
# Add items
d1["006981166"] = "John"
d1["007094117"] = "Mike"
d1["007094118"] = "Scott"
for line in r:
if first:
header = line
print "Header:", header
if idf not in header or namef not in header:
print "Id and Name fields must exist in CSV data"
sys.exit(0)
csv.writer(sys.stdout).writerow(header)
w = csv.DictWriter(sys.stdout, header)
first = False
continue
# Read the result
result = {}
i = 0
while i < len(header):
if i < len(line):
result[header[i]] = line[i]
else:
result[header[i]] = ''
i += 1
# Perform the lookup
if len(result[idf]) and len(result[namef]) :
w.writerow(result)
elif len(result[idf]):
result[namef] = lookup(result[idf], d1)
if len(result[namef]):
w.writerow(result)
# Given a Id, find its Name
def lookup(id, d1):
try:
for key in d1.keys():
if key == id:
#print "Value=", d1[key]
return d1[key]
except:
return []
main()
However, when I run the below search, It doesn't return any search results under name
source="Test_Log.txt" | xmlkv entry | lookup namelookup Id OUTPUT Name | table Id, name
Please let me know where i am going wrong in the script or where exactly is the script failing. Is their a way to debug the script using Komodo Edit IDE . I want debugger to launch the moment you hit enter in the Splunk Web Interface because i am not even sure the script is invoked by Splunk. So i would like to see atleast the first print statement in the script is printed onto console.
When i tried to run as standlone program using the command
splunk cmd namelookup.py 123
it opens a command prompt and immediately closes it. So not sure whats going on with this script
Thanks for wonderful explaination hexx.
I did exactly as per your recommendation and my scripted lookup behaves in the same manner as yours i.e. the results displayed on stdout are as desired
C:\Splunk\etc\system\bin>db_lookup.py memberId memberName < memberInput.csv
produces following output retrieving memberName from database
memberId,memberName
006,RANDY
007,LEONY
009,RANDOLPH
However, when I invoke the scripted lookup from splunk search as shown below , It doesn't return any results under memberName column
source="Test_Log.txt" | xmlkv entry | lookup namelookup memberId OUTPUT memberName | table memberId, memberName
Please let me know what am I missing?
any update on this?
Hello Bansi.
In agreement with jrodman, I believe that at this point it is important to focus on the validation of the external lookup script at a low level, from the Splunk command line.
To better understand the constraints that an external lookup script must respond to, I would to recommend a careful read of this section of our documentation :
As an example, I will demonstrate here how the host/ip external lookup script that ships with Splunk (external_lookup.py) can be validated in that way :
This file should contain a header listing the fields we want to work with. One of these should be the "input" field (in our example, "host") and the other should be the "output" field (in our example, "ip") that are passed on as argument to the script.
Here's what my input file "input.csv" looks like :
host,ip
www.hardware.fr,
www.bash.org,
www.somafm.com,
Note that only the "host" column is populated here. When I feed this file as input to external_lookup.py while specifying that it should look at the "host" and "ip" fields as they are defined in the CSV header, the script will use external DNS resolution to fill in the blanks.
# $SPLUNK_HOME/bin/splunk cmd python $SPLUNK_HOME/etc/system/bin/external_lookup.py host ip < input.csv
In our case, we are getting the expected result : The lookup script shows on stdout a now complete CSV file. The "ip" field has been populated for each line by performing DNS resolution on the value of the "host" field for that line :
host,ip
www.hardware.fr,83.243.20.80
www.bash.org,69.61.106.93
www.somafm.com,64.147.167.20
In the context of search, the splunk-search process would use that CSV output to enrich events by adding an "ip" field and populating it with the values generated.
My recommendation to you is to make sure that your own scripted lookup behaves in this manner when tested and can operate with the same arguments/inputs. I also think that it would be a good idea to make your lookup bi-directional and able to look up a name given an ID.
Thanks for wonderful explaination. I did exactly as per your recommendation and my scripted lookup behaves in the same manner as yours i.e. works fine
However, when I run the below search, It doesn't return any search results under name
source="Test_Log.txt" | xmlkv entry | lookup namelookup Id OUTPUT Name | table Id, name
Please note the purpose of lookup script in my case is to retrieve Name from database for a given Id in Splunk Search query. i.e.
.....| lookup namelookup Id OUTPUT Name
Please note i modelled the lookup script based on external_lookup.py shipped with installation or http://blogs.splunk.com/2009/09/14/enriching-data-with-db-lookups-part-2/
These scripts take "CSV input from Splunk via standard input
"
This doesnt seems to be working in my case. So is their a way to debug the lookup script to make sure CSV input from Splunk is really supplied to lookup script via stdin.
To prove my point, I modified the lookup script by commenting following lines and It runs perfectly fine as standalone program
#namef = sys.argv[2] // This doesnt really make sense.
#r = csv.reader(sys.stdin)
#w = None
#header = []
#first = True
#csv.writer(sys.stdout).writerow(header)
#w = csv.DictWriter(sys.stdout, header)
So the question boils down to how to make the script
CSV input from Splunk via standard input
" Any pointers/suggestions to make the script working will be greatly appreciated
jrodman, Thanks for suggestion. Would you mind providing an example code snippet on how to perfor the Logging you suggested i.e. Open a logfile -- lf = open(logfile, "a", 1 ) -- at the beginning of the script and log variable states and debug lines to it lf.write(), using such things as repr(), str(), the pprint module
I don't know how to use komodo. A lookup script is typically invoked multiple times and is extremely short lived. If you know a good recipe to attach a debugger to a very short lived process, go for it. Personally I find this approach slow and cumbersome. Open a logfile -- lf = open(logfile, "a", 1 ) -- at the beginning of the script and log variable states and debug lines to it lf.write(), using such things as repr(), str(), the pprint module, and other things produces repeatable rapid results across tests and modifications.
I'm only read the script diagonally, but it looks like it maps ID to Name, but not Name to ID. Splunk maps in reverse in order to build an efficient search, and then maps forward in order to decorate / enrich the events.
Therefore the script should be able to handle the case where it receives names and emit completed table entries with both the ID and the name. If this is not possible, you may get what you want by emitting an asterisk as the ID.
As for your test method, I would recommend
splunk cmd python namelookup.py 123
You could also try this from any python, such as one downloaded yourself from the internets.
As a side note, you may want to configure the csv reader to handle large data sizes if you could imagine that you might ever receive malformed data.
More clearly, your print statement is a bug. Please test your script externally, and validate that the output is csv only. Once you are certain the script is producing valid csv, if the lookup is still not working you may wish to engage splunk support.
Sorry, I assumed this part worked. It looks like your script is currently configured to produce debug output (print "Header:"...) Did you start with a specific example?
i modified the script to run as standalone program by commenting lines : csv.writer(sys.stdout).writerow(header)
w = csv.DictWriter(sys.stdout, header)
So the question boils down to how to make it work with Splunk or how to make it write to CSV file
The purpose of lookup script in my case is to given an Id, connect to database and retrieve Name by Id. So i am not sure what you are suggesting. Could you please elaborate. Please note we don't have Name value initially available to us infact we are retriving it from database by passing Id value as an argument in "SELECT" clause. Anyhow that script is far from working so i hard-coded the values in a dictionary with keys as Id and Values as Name as a prrof-of-concept that Splunk will be able to call the lookup script namelookp.py