Solved: Example of doing an external lookup using HTTP GET...

dpadams · ‎01-09-2011

I've been looking at Splunk's external lookup features and they sound ideal for several of my logs. For example, I've got a log with a user ID where I'd like to be able to do counts based on their sales region. I can easily get the region name or ID from the user by calling a simple URL like the illustrative example below:

http://www.foo.com/splunk/lookup/GetSalesRegionNameFromUserID?user_id=1b4545x5

With a result of an empty string or an ID.

Can anyone point me to an example that uses HTTP GET or POST to call a URL to satisfy an external lookup? As far as the various configuration file modifications and packaging requirements, I figure I'll use example from the following article by Nimish Doshi as a starting point:

http://blogs.splunk.com/2009/09/14/enriching-data-with-db-lookups-part-2/

That article use SQL calls but, otherwise, the idea is much the same. It might be relevant to mention that I'm pretty clueless about Python but that I have total control over the Web interface. (The URL, inputs and outputs.) I've already written and tested the Web+DB side of this, I just need to figure out how to call the URL from Python and get the inputs/outputs sorted out with Splunk.

Getting a working example would be incredibly helpful to me as, starting from that, I could build out many other similar functions.

Thanks in advance for any help, pointers, or suggestions.

P.S. I looked at GET workflow actions but they seem to be about adding interactivity to search results. What I'm after right now is the ability to do statistics based on external lookup field values as well as improve the meaningfulness of log entries in results.

ayme · ‎01-09-2011

Here is a quick example of a lookup script. Change the value returned by the lookup() function according to how the response of your web page looks as well as the URL that you call. The field returned will be called "SalesRegion".

#!/usr/bin/env python

####################################
#   Translates a URL+id to         #
#  to a value returned by Url      #
#                                  #
####################################
import csv
import urllib2
import sys

def main():
    # Get name of column holding ID field to translate
    idColumn = sys.argv[1]
    if idColumn == None:
        print "incorrect argument for script lookup - ensure a fieldname with name of column holding ID is a parameter"
        sys.exit(0)

    #set name of column that will be created holding translated value (the Sales Region basically)
    salesRegionColumn = 'SalesRegion'
    reader = csv.reader(sys.stdin)
    writer = None
    header = []
    first = True

    for line in reader:
        if first:
            header = line
            #append new column name to list of headers
            header.append(salesRegionColumn)
            if idColumn not in header:
                print "Id field must be submitted"
                sys.exit(0)
            csv.writer(sys.stdout).writerow(header)
            writer = csv.DictWriter(sys.stdout, header)
            first = False
            continue

        result = {}
        i = 0
        #loop through the list of headers
        while i < len(header):
            #print 'length of line is %d' %len(line)
            if i < len(line):
                result[header[i]] = line[i]
            else:
                result[header[i]] = ''
            i += 1

        #if the SalesRegion already exists for any particular reason then just output the existing value
        if len(result[salesRegionColumn]):
            writer.writerow(result)

        #if value of ID exists, then lets use that to translate it to SalesRegion
        elif len(result[idColumn]):
            print 'url is %s' %result[idColumn]
            salesRegion = lookup("http://www.foo.com/splunk/lookup/GetSalesRegionNameFromUserID?user_id=",result[idColumn])
            result[salesRegionColumn] = salesRegion
            writer.writerow(result)

    def lookup(url,id):
        fullUrl = url+id
        try:
            response = urllib2.urlopen(fullUrl)
        except urllib2.URLError:
            return 'cannot translate ID due to UrlError'
        except:
            return 'cannot translate ID'

    data = response.read()
    #Return the first eight characters of response.  You could change this to whatever you wanted (apply regex for example).
    return data[:8]

main()

Your transforms.conf stanza should be something like this (where the name of the field holding the ID to translate is called "user_id"):

[your_lookup_name]
external_cmd = the_script_name.py user_id
fields_list = user_id, SalesRegion

View solution in original post

karaujo · ‎11-27-2014

This is great information. Thank you to everyone for this.

I have a question regarding saving the retrieved SalesRegion to the Splunk index. What I'm finding is my searches have really slowed down since the python script is making several calls to the web to resolve the user id. Moreover, it's not actually writing the SalesRegion to the index so every time I do a search, run a report, create a pivot, etc, the python script is re-invoked and subsequently, so are the HTTP calls for the entire search scope.

Can anyone help here?

ayme · ‎01-23-2011

Tested it, as long as your search returns a field named user_id and your script is written to return a column named MarketID (header.append("MarketID")) should work with transforms.conf:

[marketIDLookup] 
external_cmd=user_to_market_id.py user_id MarketID 
external_type=python 
fields_list=user_id, MarketID

props.conf:

[int_web_log] 
LOOKUP-table=marketIDLookup user_id OUTPUT MarketID AS market_id

ayme · ‎01-09-2011

Here is a quick example of a lookup script. Change the value returned by the lookup() function according to how the response of your web page looks as well as the URL that you call. The field returned will be called "SalesRegion".

#!/usr/bin/env python

####################################
#   Translates a URL+id to         #
#  to a value returned by Url      #
#                                  #
####################################
import csv
import urllib2
import sys

def main():
    # Get name of column holding ID field to translate
    idColumn = sys.argv[1]
    if idColumn == None:
        print "incorrect argument for script lookup - ensure a fieldname with name of column holding ID is a parameter"
        sys.exit(0)

    #set name of column that will be created holding translated value (the Sales Region basically)
    salesRegionColumn = 'SalesRegion'
    reader = csv.reader(sys.stdin)
    writer = None
    header = []
    first = True

    for line in reader:
        if first:
            header = line
            #append new column name to list of headers
            header.append(salesRegionColumn)
            if idColumn not in header:
                print "Id field must be submitted"
                sys.exit(0)
            csv.writer(sys.stdout).writerow(header)
            writer = csv.DictWriter(sys.stdout, header)
            first = False
            continue

        result = {}
        i = 0
        #loop through the list of headers
        while i < len(header):
            #print 'length of line is %d' %len(line)
            if i < len(line):
                result[header[i]] = line[i]
            else:
                result[header[i]] = ''
            i += 1

        #if the SalesRegion already exists for any particular reason then just output the existing value
        if len(result[salesRegionColumn]):
            writer.writerow(result)

        #if value of ID exists, then lets use that to translate it to SalesRegion
        elif len(result[idColumn]):
            print 'url is %s' %result[idColumn]
            salesRegion = lookup("http://www.foo.com/splunk/lookup/GetSalesRegionNameFromUserID?user_id=",result[idColumn])
            result[salesRegionColumn] = salesRegion
            writer.writerow(result)

    def lookup(url,id):
        fullUrl = url+id
        try:
            response = urllib2.urlopen(fullUrl)
        except urllib2.URLError:
            return 'cannot translate ID due to UrlError'
        except:
            return 'cannot translate ID'

    data = response.read()
    #Return the first eight characters of response.  You could change this to whatever you wanted (apply regex for example).
    return data[:8]

main()

Your transforms.conf stanza should be something like this (where the name of the field holding the ID to translate is called "user_id"):

[your_lookup_name]
external_cmd = the_script_name.py user_id
fields_list = user_id, SalesRegion

rharrisssi · ‎10-25-2014

Why is half the script in the answer above out of the code box?

grijhwani · ‎10-27-2014

Bad formatting of the code. The answer had been pasted in as pure HTML, not board markup. I have now corrected this, and also removed your remark to being a comment on the answer.

dpadams · ‎01-09-2011

I think I'm actually left with possibly two problems:

1) Something is wrong in my configuration statements as the field does not show up in the field list or picker.

2) I may be returning the data in the wrong format. I'd like to return the wrong data but have also tried

MarketID+CRLF+data+CRLF

Getting as far as seeing the requests come in and the response go out is a huge win, but I'd still be grateful for help figuring out how to get everything wired into Splunk properly.

P.S. I don't see that it matters but I put my script into

$SPLUNK_HOME/etc/searchscripts/user_to_market_id.py

dpadams · ‎01-09-2011

Added a stanza to props.conf tied to the sourcetype, like

[int_web_log]

lookup_table = marketIDLookup user_id OUTPUT MarketID AS market_id

Added a stanza to transforms.conf, like:



[marketIDLookup]

external_cmd = user_to_market_id.py user_id MarketID

external_type = python

fields_list = user_id, MarketID

Restarted Splunk.

When I load data from the int_web_log sourcetype, the new field does not show up in the list. I've experimented with trying to do the lookup directly from the search bar using a pipe, but get no results their either.

dpadams · ‎01-09-2011

Thanks for the code! That's a huge help, thanks very much. I've adapted what you posted and Splunk is now calling my Web interface correctly. I can see the requests coming in with the right arguments and am able to respond. Right now, nothing is showing up in Splunk, but I can see that the data is going over the wire. So, it looks like I still have a Splunk configuration step done incorrectly. I've been going through samples and the docs and have done the following:

Example of doing an external lookup using HTTP GET or POST?

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

Splunk Classroom Chronicles: Training Tales and Testimonials

Access Tokens Page - New & Improved