Lookup in splunk

loveforsplunk · ‎05-15-2017

I want to use lookup in splunk . I am very new to lookup command . I have uploaded a csv file , suppose named lookupfile.csv having four columns col1 , col2, col3, col4 . But I only want to get the details in col 3 based on search col1.

My search string is
index=abc| table col1 col3| lookup lookupfile col1 OUTPUTNEW col3

Now, among all the columns I have fields in events for only col1 and col 4 and do not have any field created for col2 and col3.

Following are my questions:
1. Is it necessary to have fields created in splunk for all the columns in CSV file.
2. How do I create field for col3 which I want to be the output based on the search of rows in col1.

kearaspoor · ‎05-16-2017

Recommend you read all the details here:
http://docs.splunk.com/Documentation/Splunk/6.6.0/SearchReference/Lookup

But in short, from your search: (index=abc| table col1 col3| lookup lookupfile col1 OUTPUTNEW col3)

On the assumption that col3 doesn't already exist in your raw event data , you don't necessarily need to do the | table command before the lookup. There's nothing wrong with doing so, it just doesn't make a lot of sense if that column is just blank. If col3 does exist in your raw events, that's a whole other story and there's nothing at all wrong with the table command.

But assuming you just want to lookup up col1 and output col3, you've got the basic lookup syntax. The tricks are:

1) Make sure that "lookupfile" matches the stanza in the transforms.conf file which specifies the location of the lookup table file. Depending on if you created a lookup definition or not (http://docs.splunk.com/Documentation/Splunk/6.6.0/Knowledge/Aboutlookupsandfieldactions) this value may require the .csv at the end or be a different name than your csv file.

2) Make sure that col1 and col3 from your events each match the exact field name (case-sensitive) of what you're referencing in the table. If there's any differences use AS clauses as specified from the doco under the "Syntax" section.

So in the end I'd tend to do:
index=abc| lookup lookupfile lookupcol1 AS eventcol1 OUTPUTNEW col3 | table lookupcol1 col3
OR
index=abc | table eventcol1 | lookup lookupfile lookupcol1 AS eventcol1 OUTPUTNEW col3

Either should get you basically the same end results.

To specifically answer your questions:
1) No, you can use any combinations of the fields from your event data to cross-reference to any number of columns from the lookup table. There's no need to use all, just what you need. You can also do multiple lookups to also get any other combination you might be interested in... lookup col2 output col4, or lookup col1 output col 3 and col2

2) You don't need to create col3 the lookup command itself creates it. Read the "Usage" section of the doco to explain when/why you'd prefer to use OUTPUT vs OUTPUTNEW either of those options.

loveforsplunk · ‎05-17-2017

Thanks a lot for your knowledgeable information. I have the below requirement .. can u help.

index=abc|lookup lookupfile col1 col2 OUTPUTNEW col3 | stats count by col1 col2 col3

Here my expectation is if the events matches col1 and col2 it will output the value in col3 after matching it in the table . But if there is a new event which is not present in the col1 then also it will display in statistics giving col3 as blank

kearaspoor · ‎05-18-2017

I'm not aware of any way to force the lookup command to use more than one field on the input side but you could do two lookups to get 2 new columns:

index=abc
|lookup lookupfile col1 OUTPUTNEW col3 AS Match_col1
|lookup lookupfile col2 OUTPUTNEW col3 AS Match_col2

Then use the eval command to compare those 2 new columns:
| eval combined=if(isnull(Match_col1,"null",Match_col2)

This will create a new column called "combined" where if the value of Match_col1 is blank (ie: col1 did NOT produce an output of col3) it will display "null" otherwise (for those lookups where col1 DID match), it'll carry forward the value Match_col2, which is where looking up col2 also resulted in an output of col3

This is particular option has been tested on a dataset where both the "col1" and "col2" will always have at least one matching entry to "col3".... I'm not certain how it'll behave if you have entries where both "col1" and "col2" don't have any corresponding value on your table. I'd assume it will result in a null entry, but you should probably validate that rather than assume.

Lookup in splunk

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms