how to perform lookup on CSV file from search on index?
For example below: I want to find out if "name" on employee.csv exists on "name" on testindex
With the data below, the result should be name3 addr3 phone3 Please help. Thank you!!
index=testindex | inputlookup employee.csv | field name ??? ==> does not work
result:
name | address | phone
name1 addr1 phone1
name3 addr3 phone3
employee.csv
name | position | company |
name3 position3 company3
The inputlookup command is for something completely different - it returns the contents of your lookup file. If you want to perform a lookup, you just need to use... yes, the lookup command!
So your original idea was a good one, just with a wrong command
index=whatever
| lookup employee.csv
| [...]
Hi @LearningGuy
Here are some example methods
Only outputs results matched against name field in lookup file
index=testindex name=*
| lookup employee.csv name OUTPUT positon company
Outputs all results and tests whether a match in lookup file existed
index=testindex name=*
| inputlookup append=true employee.csv
| fields name address phone position company
| stats values(*) AS * BY name
| eval intestindex=if(isnotnull(company), "yes", "no")
Hope that helps
To verify if something exists in a lookup it's usually better to just do the lookup and verify the results either with something like
<...something...>
| lookup lookup.csv lookupfield OUTPUT lookupfield AS outputfield
| eval found=if(outputfield==lokupfield,1,0)
Or even
<...something...>
| lookup lookup.csv lookupfield OUTPUT lookupfield
(In this case the field values which are not in the lookup will get nulled-out).
The lookup-based approach is more effective compared to inputlookup and stats because lookup is a distributable streaming command and you can still continue processing your search pipeline on the indexers whereas stats will consolidate the results and move processing to SH.
Hello,
index=testindex name=* | lookup employee.csv name OUTPUT position company
1) a) Is name=* required as a field that I want to look up?
b) if I want to lookup name on employee.csv that has name2, should I use | lookup employee.csv name2?
c) So if the field that I want to lookup is address, should I use address=*?
2) I used your query to perform lookup on employee.csv that does not have any match, but it still showed all output on the search result. It also did not show position and company field
Can you please fix this? Thanks
index=testindex name=*
| inputlookup append=true employee.csv
| fields name address phone position company
| stats values(*) AS * BY name
| eval intestindex=if(isnotnull(company), "yes", "no")
1) The search worked fine. Can you please explain the logic behind it?
2) I am not sure the purpose of append=true. Did it append something on csv files?
3) What is the purpose of: of values (*) AS * BY Name
4) What is the difference between inputlookup and lookup?
I appreciate your help. Thanks
The Splunk docs are a good place to get more acquainted with SPL commands
https://docs.splunk.com/Documentation/Splunk/9.0.4/SearchReference/Lookup
https://docs.splunk.com/Documentation/Splunk/9.0.4/SearchReference/Inputlookup
1. a) The name=* does a couple of things: it filters events that have name=<value>, and it also tells Splunk query that you are interested in this field so extract it as a key value pair. This may be important depending on the search mode you are using (fast, smart, verbose). And yes, it is a must for the lookup to work.
b) In this case, the event "name" field equates to a lookup csv header "name". If the search had a field called name1 then the lookup syntax is "lookup employee.csv name AS name1"
c) Yes, though note, altering your base search filters may also alter your result output. Like most scripting languages there are many ways to do things, which have there pros and cons. But generally, if you know you are interested in a field, (or fields), then using them in the base search helps ensure they are auto extracted and ready for subsequent usage.
2. Yeah, my mistake. You can filter for matches like this.
index=testindex name=*
| lookup employee.csv name OUTPUT positon company
| search company=*
If the lookup has a match (and assuming the event does not have a field called company) then event will be enriched with the company field result
For the second query, here are some comments...
index=testindex name=*
``` append the whole lookup file to the search results ```
| inputlookup append=true employee.csv
``` output the fields I'm interested in ```
| fields name address phone position company
``` summarise/transform the results into a table using name as the grouping jey```
| stats values(*) AS * BY name
``` add a new field with a tested value ```
| eval intestindex=if(isnotnull(company), "yes", "no")
As you may have noticed, the stats command changes the output so you do not see the _raw events anymore.
Play around with it to keep learning. Remove the bottom line and repeat search to see how each line changes the result set.
If this has answered your question then please mark this as solution provided.
Hello,
I appreciate your help. My apology. It turns out that there are 2 fields that I needed to filter out.
The lookup solution that you provided worked, but not with 2 column.
The inputlookup solution that you provided give me all results, instead of providing a filter.
My goal is to lookup and filter, not only lookup. See the scenario below. Please help. Thank you so much
ip location (table 1)
1.1.1.1 NY
2.2.2.2 CA
3.3.3.3 TX
4.4.4.4 GA
ip1 | ip2 | name (table 2)
2.2.2.2 | | name2 (ip2 is empty on this row)
| 4.4.4.4 | name4 (ip1 is empty on this row)
output: (table 1 lookup and filter based on table 2)
ip | ip1 | ip2 | location | name
2.2.2.2 2.2.2.2 NY name2
4.4.4.4 4.4.4.4 GA name4
The aim was only ever to demonstrate how lookups could be used. Further filtering of results can be done with "search", "where", or the "regex" commands.
I'm unclear with the problem you have outlined as you have presented it all as tables. What is the lookup and what are the raw events. Maybe present the SPL you already have with examples of the raw events.
If you want to have the lookup match (table 1?) on the raw events (table2?) then you just need to do two lookups. Here's a run anywhere example
| makeresults| eval _raw="ip,location
1.1.1.1,NY
2.2.2.2,CA
3.3.3.3,TX
4.4.4.4,GA"
| multikv forceheader=1 | table ip location | outputlookup mytemplookup.csv ``` 1. create a lookup file (table1)```
``` 2. create some dummy events (table2)```
| streamstats count
| eval ip1=if(count=2, ip, null())
,ip2=if(count=4, ip, null())
,name=case(count=2, "name2", count=4, "name4")
| table ip1 ip2 name
``` 3. the following uses the lookup and filters results ```
| lookup mytemplookup.csv ip AS ip1 OUTPUT ip location
| lookup mytemplookup.csv ip AS ip2 OUTPUT ip location
| search ip1=* OR ip2=*
| table ip* location name
The first two parts simply create some dummy data to demonstrate (an example) how you would use the SPL in the third part of the query to produce your output. Obviously, you'll need to alter this to work with what you have got so far.
Hopefully that makes sense
Hello,
Sorry I wasn't clear. Table 1 is from raw event (syslog), let say index=table1. The content is not only 4 rows, it's more than 100k in 24 hour. I am trying to filter out that 100k ip addresses in table1 events from ip addresses listed in CSV (i called it table2 that has 2 column IP address1, ip address2)
Your initial lookup solution worked but only for 1 column (ipaddress_1).
If I do this manually, I would put the data in excel and perform a VLOOKUP and filter out the row
Thank you!!
Hey Yeahnah,
I think I figured it out using lookup below:
index=table1
| lookup table2.csv ip1 as ip OUTPUTNEW ip
| lookup table2.csv ip2 as ip OUTPUTNEW ip
| search ip1=* or ip2=*
| table ip, ip1, ip2, location, name
Although I am still not sure why it worked by doing lookup twice 🙂 and not sure why search work with OR instead of AND (filter out both blank ip1 and ip2)
And I was unable to get it to work using inputlookup
Thanks for your help!