About my Environment
Everything here is run using Splunk 6.4.2.
The Problem
I need to correlate session IDs and IP addresses between two sets of
data. It involves:
Finding the session IDs (sid) and source IPs (src_ip) from the first
set of data.
Finding those same session IDS (sid) in the second set of data, even
if they don't match the src_ip from the first set of data.
Yielding events from the second dataset with the fields sid, src_ip
and first_ip, where sid is the same between both data sets, src_ip
is unique to the second data set, and first_ip is the value of the
sid's src_ip from the first data set.
I've come very close to a resolution using the following pipeline:
(a) pull the sid from the first set as a subsearch on the second set
(b) create a sid/first_ip pairing in a lookup table "mylookup.csv"
(c) perform a lookup on mylookup.csv to match sid to first_ip
Here's the search I've tried:
(01) {query for second set} [search {query for first set} | rename src_ip as first_ip | table sid,first_ip | outputlookup mylookup.csv | fields sid] | lookup mylookup.csv sid OUTPUT first_ip
Search (01) runs fine without the "lookup" clause, in that it returns
all of the events from the second data set with the same sid as those
in the first. When I run the search as written, though, I get the
error:
(02) Error in 'lookup' command: The lookup table 'mylookup.csv' does not exist or is not available.
What's strange is that I know the lookup must exist, because after
running search (01), I can retrieve the table's contents using the
following command [1]:
(03) | inputlookup mylookup.csv
Things I've Tried
I tried using mylookup.csv to lookup sid as another field [2], like this:
(05) {query} | lookup mylookup.csv sid AS my_sid.
And that returned the same does not exists/not available error.
I've even tried first running a search that creates mylookup.csv, then
running a search to perform a lookup on mylookup.csv, like this:
(04a) {query for first set} | rename src_ip as first_ip | table sid,first_ip | outputlookup mylookup.csv
(04b) {query for second set} [ search {query for first set} | fields sid ] | lookup mylookup.csv sid OUTPUT first_ip
Search (04a) completes, but I still get the same error at (02) when I
run the search at (04b).
I've checked the "Exploring Splunk" book, my Splunk training material,
and answers.splunk.com and haven't found anything else explictly
talking about using the lookup table created by outputlookup, just how
to create the lookup table.
Questions
(A) Is there a canonical way of referencing lookups that you've
created using outputlookup that I'm missing? Do I need to create a
lookup definition for the lookup table I create, or is
mylookup.csv sufficient?
(B) Is there a better way to perform the kind of correlation I want? I
haven't tried the KV Store yet, as I'd like to know that I can use
the output of outputlookup first.
Thanks!
[1] https://answers.splunk.com/answers/144139/how-do-i-search-a-csv-file-created-via-outputlookup.html
[2] https://answers.splunk.com/answers/54165/lookup-use-without-lookup-definition.html
... View more