How to perform lookup on CSV file from search on i...

LearningGuy · ‎05-23-2023

how to perform lookup on CSV file from search on index?
For example below:   I want to find out if "name" on employee.csv exists on "name" on testindex
With the data below, the result should be name3   addr3   phone3      Please help. Thank you!!
index=testindex    | inputlookup employee.csv   | field name   ??? ==> does not work
result:
name    | address | phone
name1      addr1         phone1
name3      addr3         phone3

employee.csv
name | position | company |
name3 position3 company3

PickleRick · ‎05-28-2023

The inputlookup command is for something completely different - it returns the contents of your lookup file. If you want to perform a lookup, you just need to use... yes, the lookup command!

So your original idea was a good one, just with a wrong command

index=whatever
| lookup employee.csv
| [...]

yeahnah · ‎05-23-2023

Hi @LearningGuy

Here are some example methods

Only outputs results matched against name field in lookup file

index=testindex name=*
| lookup employee.csv name OUTPUT positon company

Outputs all results and tests whether a match in lookup file existed

index=testindex name=*
| inputlookup append=true employee.csv
| fields name address phone position company
| stats values(*) AS * BY name
| eval intestindex=if(isnotnull(company), "yes", "no")

Hope that helps

PickleRick · ‎05-28-2023

To verify if something exists in a lookup it's usually better to just do the lookup and verify the results either with something like

<...something...>
| lookup lookup.csv lookupfield OUTPUT lookupfield AS outputfield
| eval found=if(outputfield==lokupfield,1,0)

Or even

<...something...>
| lookup lookup.csv lookupfield OUTPUT lookupfield

(In this case the field values which are not in the lookup will get nulled-out).

The lookup-based approach is more effective compared to inputlookup and stats because lookup is a distributable streaming command and you can still continue processing your search pipeline on the indexers whereas stats will consolidate the results and move processing to SH.

LearningGuy · ‎05-23-2023

Hello,

index=testindex name=* | lookup employee.csv name OUTPUT position company
1) a) Is name=* required as a field that I want to look up?
b) if I want to lookup name on employee.csv that has name2, should I use | lookup employee.csv name2?
c) So if the field that I want to lookup is address, should I use address=*?
2) I used your query to perform lookup on employee.csv that does not have any match, but it still showed all output on the search result. It also did not show position and company field
Can you please fix this? Thanks

index=testindex name=*
| inputlookup append=true employee.csv
| fields name address phone position company
| stats values(*) AS * BY name
| eval intestindex=if(isnotnull(company), "yes", "no")

1) The search worked fine. Can you please explain the logic behind it?
2) I am not sure the purpose of append=true. Did it append something on csv files?
3) What is the purpose of: of values (*) AS * BY Name
4) What is the difference between inputlookup and lookup?

I appreciate your help. Thanks

yeahnah · ‎05-23-2023

The Splunk docs are a good place to get more acquainted with SPL commands

https://docs.splunk.com/Documentation/Splunk/9.0.4/SearchReference/Lookup
https://docs.splunk.com/Documentation/Splunk/9.0.4/SearchReference/Inputlookup

1. a) The name=* does a couple of things: it filters events that have name=<value>, and it also tells Splunk query that you are interested in this field so extract it as a key value pair. This may be important depending on the search mode you are using (fast, smart, verbose). And yes, it is a must for the lookup to work.

b) In this case, the event "name" field equates to a lookup csv header "name". If the search had a field called name1 then the lookup syntax is "lookup employee.csv name AS name1"

c) Yes, though note, altering your base search filters may also alter your result output. Like most scripting languages there are many ways to do things, which have there pros and cons. But generally, if you know you are interested in a field, (or fields), then using them in the base search helps ensure they are auto extracted and ready for subsequent usage.

2. Yeah, my mistake. You can filter for matches like this.

index=testindex name=*
| lookup employee.csv name OUTPUT positon company
| search company=*

If the lookup has a match (and assuming the event does not have a field called company) then event will be enriched with the company field result

For the second query, here are some comments...

index=testindex name=*
``` append the whole lookup file to the search results ```
| inputlookup append=true employee.csv
``` output the fields I'm interested in ```
| fields name address phone position company
``` summarise/transform the results into a table using name as the grouping jey``` 
| stats values(*) AS * BY name
``` add a new field with a tested value ```
| eval intestindex=if(isnotnull(company), "yes", "no")

As you may have noticed, the stats command changes the output so you do not see the _raw events anymore.

Play around with it to keep learning. Remove the bottom line and repeat search to see how each line changes the result set.

If this has answered your question then please mark this as solution provided.

LearningGuy · ‎05-24-2023

Hello,
I appreciate your help. My apology. It turns out that there are 2 fields that I needed to filter out.
The lookup solution that you provided worked, but not with 2 column.
The inputlookup solution that you provided give me all results, instead of providing a filter.
My goal is to lookup and filter, not only lookup. See the scenario below. Please help. Thank you so much

ip location (table 1)
1.1.1.1 NY
2.2.2.2 CA
3.3.3.3 TX
4.4.4.4 GA

ip1 | ip2 | name (table 2)
2.2.2.2 | | name2 (ip2 is empty on this row)
| 4.4.4.4 | name4 (ip1 is empty on this row)

output: (table 1 lookup and filter based on table 2)
ip | ip1 | ip2 | location | name
2.2.2.2 2.2.2.2 NY name2
4.4.4.4 4.4.4.4 GA name4

yeahnah · ‎05-24-2023

The aim was only ever to demonstrate how lookups could be used. Further filtering of results can be done with "search", "where", or the "regex" commands.

I'm unclear with the problem you have outlined as you have presented it all as tables. What is the lookup and what are the raw events. Maybe present the SPL you already have with examples of the raw events.

If you want to have the lookup match (table 1?) on the raw events (table2?) then you just need to do two lookups. Here's a run anywhere example

| makeresults| eval _raw="ip,location
1.1.1.1,NY
2.2.2.2,CA
3.3.3.3,TX
4.4.4.4,GA"
| multikv forceheader=1 | table ip location | outputlookup mytemplookup.csv  ``` 1. create a lookup file (table1)```
``` 2. create some dummy events (table2)```
| streamstats count
| eval ip1=if(count=2, ip, null())
      ,ip2=if(count=4, ip, null())
      ,name=case(count=2, "name2", count=4, "name4")
| table ip1 ip2 name
``` 3. the following uses the lookup and filters results ```
| lookup mytemplookup.csv ip AS ip1 OUTPUT ip location
| lookup mytemplookup.csv ip AS ip2 OUTPUT ip location
| search ip1=* OR ip2=*
| table ip* location name

The first two parts simply create some dummy data to demonstrate (an example) how you would use the SPL in the third part of the query to produce your output. Obviously, you'll need to alter this to work with what you have got so far.

Hopefully that makes sense

LearningGuy · ‎05-25-2023

Hello,
Sorry I wasn't clear. Table 1 is from raw event (syslog), let say index=table1. The content is not only 4 rows, it's more than 100k in 24 hour. I am trying to filter out that 100k ip addresses in table1 events from ip addresses listed in CSV (i called it table2 that has 2 column IP address1, ip address2)
Your initial lookup solution worked but only for 1 column (ipaddress_1).
If I do this manually, I would put the data in excel and perform a VLOOKUP and filter out the row
Thank you!!

LearningGuy · ‎05-26-2023

Hey Yeahnah,
I think I figured it out using lookup below:
index=table1
| lookup table2.csv ip1 as ip OUTPUTNEW ip
| lookup table2.csv ip2 as ip OUTPUTNEW ip
| search ip1=* or ip2=*
| table ip, ip1, ip2, location, name

Although I am still not sure why it worked by doing lookup twice 🙂 and not sure why search work with OR instead of AND (filter out both blank ip1 and ip2)
And I was unable to get it to work using inputlookup
Thanks for your help!

How to perform lookup on CSV file from search on index?

join

lookup

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!