I have seen documentation on lookups in general. I am most interested in dynamic lookups. Specifically I have the following scenario:
how do I setup a JDBC connection to Oracle in the Splunk config files?
How can I use the Oracle connection to lookup a table in Splunk at search time?
Please provide some examples. Thanks!
(2+ years later) there is now the Splunk DB Connect App:
http://splunk-base.splunk.com/apps/50803/splunk-db-connect
I think JDBC connection is irrespective of Splunk. Even if you write a lookup script you need to make a connection to database and load specific driver thats where JDBC/ODBC connection comes into play.
Update: A this point (2017), this answer should only be used as a python external (scripted) lookup example. If you need a DB lookup, please use DB connect instead. Maintaining python db libraries/modules within the python environment that ships with Splunk is a serious pain that should be avoided whenever possible. Nevertheless, you may find some of the following useful.
Here is a sample oracle external lookup script. Obviously you'll have to heavily modify this script to suite your needs. This includes all login information, the SQL query, column and field name mappings, etc.
This script does a lookup on a single primary key and returns adds two new fields. You could make the script more intelligent so that you could do reverse lookups (or lookups on any other field) but that would just make thinks more complicated and it didn't seem suitable for an example.
Additional error handling would be recommended. Right now, the only "exception" handled gracefully is when no row is found (the lookup fails because no value exist in the table.)
Here is the contents of my $SPLUNK_HOME/etc/apps/YourApp/bin/ora_lookup_example.py
script:
import csv
import cx_Oracle
#-------------------------------------------------------------------------------
# This is for debugging the script. You should disable this once everything is
# working. Output is written to $SPLUNK_HOME/var/log/splunk/python.log, which is
# searchable in splunk with: index=_internal source=*python.log ORALOOKUP:
# Be sure to also remove/disable 'logger.info(...)' lines when you disable this.
import logging
logger = logging.getLogger("ora_lookup_example")
#-------------------------------------------------------------------------------
FIELDS = [ "PartnerId", "CorporationName", "OrgUnitName" ]
def db_connect():
# Build a DSN (can be substituted for a TNS name)
dsn = cx_Oracle.makedsn("host.example.net", 1521, "SID")
db = cx_Oracle.connect("username", "password", dsn)
cursor = db.cursor()
return cursor
def db_lookup(cursor, key):
cursor.execute("""
SELECT CORPORATIONNAME, ORGUNITNAME
FROM PARTNER
WHERE PARTNERID = :id
AND STATUS = 'Active'""", dict(id=key))
row = cursor.fetchone()
return row
def main(input, output, cursor):
csv_in = csv.DictReader(input)
csv_out = csv.DictWriter(output, FIELDS)
# Write header row
csv_out.writerow(dict(zip(FIELDS,FIELDS)))
for row in csv_in:
# Get the input file we are looking up
key = row["PartnerId"]
# Fetch two fields we are looking up in the table
db_row = db_lookup(cursor, key)
if db_row:
(corp, org) = db_row
row["CorporationName"] = corp
row["OrgUnitName"] = "" or org # Use empty string if org is NULL
logger.info("ORALOOKUP: PartnerId=%r: corp=%r, org=%r", key, corp, org)
else:
logger.info("ORALOOKUP: PartnerId=%r: No matching entry in table.", key)
csv_out.writerow(row)
if __name__ == '__main__':
import sys
cursor = db_connect()
main(sys.stdin, sys.stdout, cursor)
You also need to setup the external lookup script in transforms.conf
, like so:
[ora_lookup_example]
external_cmd = ora_lookup_example.py
fields_list = PartnerId,CorporationName,OrgUnitName
This is optional, but if you want automatic lookups enabled for a sourcetype, you can add the following to your props.conf
file:
[my_source_type]
LOOKUP-oralookup = ora_lookup_example PartnerId AS RelatedPartnerId OUTPUT CorporationName OrgUnitName
If you don't setup the automatic lookups, then you can do this on a search-by-search basis (which you may find to be much better for overall performance), in which case you would do something like this: (the syntax may look familiar)
sourcetype=my_source_type ... | lookup ora_lookup_example PartnerId AS RelatedPartnerId OUTPUT CorporationName OrgUnitName
Note: In my sourcetype the PartnerId
value is under the field named RelatedPartnerId
, which is why the AS
is necessary for my setup
I tried to add helpful comments in the script. Please make note of them. Hope this helps you get started.
BTW. I recommend that you get started by getting your cx_Oracle install working with splunk's python install first. You can use the code in db_connect() and db_lookup() to do some database connectivity testing before you try to get the whole thing working.
You can query oracle database directly from python using cx_Oracle. (You will want the 5.0.3 or higher since you are using an 11g database). This does require that you have a working oracle client environment setup, or you can use the Oracle Database Instant Client if you don't already have a client installed.
It's not too hard to get this setup and working but there are some potential gotchas in the process as well. For example, using cx_Oracle
(or any other python database module) requires that you somehow install that module in a way that is accessible from splunk's python environment. And you also have to make sure that the splunk environment can access your oracle libraries, which is normally handled via environmental variables. For those familiar with python, there are a number of ways to accomplish this, but so far I don't know splunk preferred method, and so there could be issues with some install approaches that could work, but may end up breaking during a splunk upgrade. (If anyone has some recommendation on this, please point us in the right direction on this.)
I had to build the cx_Oracle
library myself on my Ubuntu 8.04 server, which wasn't too bad. The one gotcha I hit was that Splunk's embedded python install was lacking the full distutils
, which are needed before you can build python C modules. So I had to manually add in some missing modules components before I could get the cx_Oracle
module to build with splunk's python. (And I couldn't build it with a separate Python 2.6 install because of a compile-time Unicode difference I encountered. Ugh! If you can use a pre-compiled version, you shouldn't hit this issue.)
Also, before you jump into writing an external lookup script (not that's it all that difficult), but there may be a more simplistic approach to consider. You should consider how often your lookup data will change. For example, we have a simple script that pulls lookup values from a database and drops them in a CSV file. Then we have splunk configured to use the CSV file as the actual lookup table instead of launching a database query for every lookup. Since this particular lookup info changes very infrequently, this removes the search-time dependency on the database and a local csv file lookup is much faster than a database query. (Our dataset is also only a few hundred records, so the CSV file is pretty small too.) We have this lookup-generating script scheduled to run periodically. It only refreshes the CSV file if something in the lookup table changes, which also lets us create an date-effectivity within our lookup file). Again, this only works for pretty-static lookup info which may or may not be applicable in your scenario. One advantage to this approach is that you can use any tool you want, all it needs to do is create a simple CSV file, so you can use whatever is most ideal for your scenario.
Hopefully you'll find some of this helpful.
Additional resources:
You don't set up a JDBC connection directly in Splunk. You need to build a lookup script yourself to perform whatever lookups you want to do. There is an example lookup script (in Python) shipped with Splunk in $SPLUNK_HOME/etc/system/bin/external_lookup.py
, as well as documentation on it here: http://www.splunk.com/base/Documentation/latest/Knowledge/Addfieldsfromexternaldatasources#Set_up_a_...
Lookup scripts must be written in Python, but they can simply be wrappers around a Python subprocess.Popen()
call to another program.
Of course the Python example script is different from Java, but the basic logic is the same, and not very complex.