Solved: How to perform lookup in CSV file from index witho...

LearningGuy · ‎07-21-2023

How to perform lookup in CSV file from index without combining data in one row (and without mvexpand)?

| index=vulnerability_index
| table ip, vulnerability, score

ip	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5

CSV file: company.csv

ip_address	company	location
192.168.1.1	Comp-A	Loc-A
192.168.1.2	Comp-B	Loc-B
192.168.1.5	Comp-E	Loc-E

Lookup in CSV from index
| index=vulnerability_index
| lookup company.csv ip_address as ip OUTPUTNEW ip_address, company, location
The vulnerability and score merged into one row.

ip_address	company	location	vulnerability	score
192.168.1.1	Comp-A	Loc-A	SQL Injection OpenSSL	9 7
192.168.1.2	Comp-B	Loc-B	Cross Site-Scripting DNS	8 5

How do I match the ip_address in a separate row without using mvexpand?
MVExpand has memory limitation and is very slow. It will not work in a large data set and time frame
My expected result is:

ip_address	company	location	vulnerability	Score
192.168.1.1	Comp-A	Loc-A	SQL Injection	9
192.168.1.1	Comp-A	Loc-A	OpenSSL	7
192.168.1.2	Comp-B	Loc-B	Cross Site-Scripting	8
192.168.1.2	Comp-B	Loc-B	DNS	5

Thank you for your help

ITWhisperer · ‎07-25-2023

Try this (although there is no guarantee you won't face memory issues with large data sets regardless of whether you use mvexpand or not):

| makeresults
| eval _raw="ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"
| multikv forceheader=1
| table ip_address vulnerability score
| lookup company.csv ip_address
| eval _rows=mvrange(0,mvcount(company))
| eval company=mvzip(_rows, company, "|")
| eval location=mvzip(_rows, location, "|")
| stats values(*) as * by ip_address vulnerability score _rows
| eval company=mvindex(split(mvindex(company, _rows),"|"),1)
| eval location=mvindex(split(mvindex(location, _rows),"|"),1)

View solution in original post

PickleRick · ‎07-22-2023

+1 to what @ITWhisperer said - your search from the index, if it really produces results as you showed in the first table, should not collapse results after simple lookup into multivalued fields as you showed in the last table. So there must be something you're not showing us here.

LearningGuy · ‎07-24-2023

Hello,
I responded to ITWhisperer. Please take a look.
I appreciate your help.

Thank you

ITWhisperer · ‎07-22-2023

The search you have used would produce results similar to what you want

| makeresults
| fields - _time
| eval _raw="ip	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"
| multikv forceheader=1
| table ip vulnerability score
| lookup company.csv ip_address as ip OUTPUTNEW ip_address, company, location

If you are getting the results you have indicated, i.e. with the vulnerability and score as multivalue fields, there must be something else in your search doing this which you haven't shared, which is collapsing the events by ip, company or location.

LearningGuy · ‎07-24-2023

Hello,
Sorry. You are correct. Thank you for your help. I was not aware that lookup from index to CSV will not return duplicate data on the index side. I was confused with the output from "append"
I updated the company.csv showing that one IP can be used in multiple companies, so when lookup is used, it merged data in "company" and "location" field.
1) How do I expand merged data in the company and location fields into individual event without using mvexpand?
2) If I have to use mvexpand, what an efficient way to use it without hitting memory limit?
Thank you so much for your help

| index=vulnerability_index
| table ip, vulnerability, score

ip	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5

company.csv

ip_address	company	location
192.168.1.1	Comp-A	Loc-A
192.168.1.1	Comp-Z	Loc-Z
192.168.1.2	Comp-B	Loc-B
192.168.1.2	Comp-Y	Loc-Y
192.168.1.5	Comp-E	Loc-E

| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location

ip_address	vulnerability	score	company	location
192.168.1.1	SQL Injection	9	Comp-A Comp-Z	Loc-A Loc-Z
192.168.1.1	OpenSSL	7	Comp-A Comp-Z	Loc-A Loc-Z
192.168.1.2	Cross Site-Scripting	8	Comp-B Comp-Y	Loc-B Loc-Y
192.168.1.2	DNS	5	Comp-B Comp-Y	Loc-B Loc-Y

Expected result with query (without mvexpand):

ip_address	vulnerability	score	company	location
192.168.1.1	SQL Injection	9	Comp-A	Loc-A
192.168.1.1	SQL Injection	9	Comp-Z	Loc-Z
192.168.1.1	OpenSSL	7	Comp-A	Loc-A
192.168.1.1	OpenSSL	7	Comp-Z	Loc-Z
192.168.1.2	Cross Site-Scripting	8	Comp-B	Loc-B
192.168.1.2	Cross Site-Scripting	8	Comp-Y	Loc-Y
192.168.1.2	DNS	5	Comp-B	Loc-B
192.168.1.2	DNS	5	Comp-Y	Loc-Y

yuanliu · ‎07-25-2023

Similar to your follow-up question in Re: How to perform lookup from index search with d...,, you can use JSON functions if your instance is 8.0 and up.

index = vulnerability_index ``` 
| table ip_address vulnerability score ```
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx))
| stats values(vulnerability) as vulnerability values(score) as score values(company) as company by ip_address
| stats values(ip_address) as ip_address by vulnerability score company
| eval location = json_extract(company, "location"), company = json_extract(company, "company")

LearningGuy · ‎07-26-2023

Hello,
Thank you for your help. I ran your query in the emulated data and it gave me 16 results, instead of 8 in the expected result. This topic is Index vs CSV using lookup, the other topic is index vs DBXquery using append. Lookup vs append does not seem to behave the same.
Comp-A only should only have 2 vulnerabilities (1 OpenSSL - with score 7 & 1 SQL Injection - with score - 9)
When I ran it in real data, it gave me merged ip_address. Please suggest. Thank you

vulnerability	score	company	ip_address	location
OpenSSL	7	Comp-A	192.168.1.1	Loc-A
OpenSSL	7	Comp-Z	192.168.1.1	Loc-Z
OpenSSL	9	Comp-A	192.168.1.1	Loc-A
OpenSSL	9	Comp-Z	192.168.1.1	Loc-Z
SQL Injection	7	Comp-A	192.168.1.1	Loc-A
SQL Injection	7	Comp-Z	192.168.1.1	Loc-Z
SQL Injection	9	Comp-A	192.168.1.1	Loc-A
SQL Injection	9	Comp-Z	192.168.1.1	Loc-Z
Cross Site-Scripting	5	Comp-B	192.168.1.2	Loc-B
Cross Site-Scripting	5	Comp-Y	192.168.1.2	Loc-Y
Cross Site-Scripting	8	Comp-B	192.168.1.2	Loc-B
Cross Site-Scripting	8	Comp-Y	192.168.1.2	Loc-Y
DNS	5	Comp-B	192.168.1.2	Loc-B
DNS	5	Comp-Y	192.168.1.2	Loc-Y
DNS	8	Comp-B	192.168.1.2	Loc-B
DNS	8	Comp-Y	192.168.1.2	Loc-Y

yuanliu · ‎07-29-2023

My mistake. The difference is not in append vs lookup; it is because I (mis)placed JSON company in values() as opposed to in groupby. Lookup is much simpler to handle, in fact. (And thanks for sharing the output from lookup which I didn't realize was available. It made diagnosis so much easier.)

| index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company

Using the sample lookup output you posted, the results are

ip_address	company	vulnerability	score	location
192.168.1.1	Comp-A	OpenSSL	7	Loc-A
192.168.1.1	Comp-A	SQL Injection	9	Loc-A
192.168.1.1	Comp-Z	OpenSSL	7	Loc-Z
192.168.1.1	Comp-Z	SQL Injection	9	Loc-Z
192.168.1.2	Comp-B	Cross Site-Scripting	8	Loc-B
192.168.1.2	Comp-B	DNS	5	Loc-B
192.168.1.2	Comp-Y	Cross Site-Scripting	8	Loc-Y
192.168.1.2	Comp-Y	DNS	5	Loc-Y

LearningGuy · ‎07-31-2023

Hello,
I tested your query and it worked on both emulated and real data.
The query time is faster than mvindex suggested by ITWhisperer. I would accept both solutions if I could.
You are correct. This post is using outputlookup while the other post was actually using "append", which "the new issue with duplicate companies" has not been solved
https://community.splunk.com/t5/Splunk-Search/How-to-perform-lookup-from-index-search-with-dbxquery/...

** NULL issue ***
Also, If the company is empty, your suggested query will not show the company
For example: Comp-A and Comp-Y are removed. The output only 4 rows as shown on Table 2.
I tried to use the following command, but it did not help. Please help. Thank you so much

| fillnull value="N/A" company

| eval company = if(company="" or isnull(company), "N/A",company)

Table1 - company.csv

ip_address	company	location
192.168.1.1		Loc-A
192.168.1.1	Comp-Z	Loc-Z
192.168.1.2	Comp-B	Loc-B
192.168.1.2		Loc-Y
192.168.1.5	Comp-E	Loc-C

Table2
Result after the query:

ip_address vulnerability score company location

192.168.1.1	OpenSSL	7	Comp-Z	Loc-A
192.168.1.1	SQL Injection	9	Comp-Z	Loc-A
192.168.1.2	Cross Site-Scripting	8	Comp-B	Loc-B
192.168.1.2	DNS	5	Comp-B	Loc-B

yuanliu · ‎07-31-2023

Either should work. Where do you insert them?

| index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| fillnull value="N/A" company
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company

LearningGuy · ‎08-01-2023

Hello,
Thank you for looking into this. I put the fillnull after the lookup command. See below:
If you create a new CSV file called company2.csv and replace Comp-A and Comp-Y with "empty" as follows:

ip_address	company	location
192.168.1.1		Loc-A
192.168.1.1	Comp-Z	Loc-Z
192.168.1.2	Comp-B	Loc-B
192.168.1.2		Loc-Y
192.168.1.5	Comp-E	Loc-E

Run the following query using company2.csv and "fillnull":

makeresults
| eval _raw="ip_address vulnerability score
192.168.1.1 SQL Injection 9
192.168.1.1 OpenSSL 7
192.168.1.2 Cross Site-Scripting 8
192.168.1.2 DNS 5"
| multikv forceheader=1
| lookup company2.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| fillnull value="N/A" company
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company

The result using company2.csv is:

ip_address company vulnerability score location

192.168.1.1	Comp-Z	OpenSSL	7	Loc-A
192.168.1.1	Comp-Z	SQL Injection	9	Loc-A
192.168.1.2	Comp-B	Cross Site-Scripting	8	Loc-B
192.168.1.2	Comp-B	DNS	5	Loc-B

The result using company.csv (without removing Comp-A and Comp-Y) is:

ip_address company vulnerability score location

192.168.1.1	Comp-A	OpenSSL	7	Loc-A
192.168.1.1	Comp-A	SQL Injection	9	Loc-A
192.168.1.1	Comp-Z	OpenSSL	7	Loc-Z
192.168.1.1	Comp-Z	SQL Injection	9	Loc-Z
192.168.1.2	Comp-B	Cross Site-Scripting	8	Loc-B
192.168.1.2	Comp-B	DNS	5	Loc-B
192.168.1.2	Comp-Y	Cross Site-Scripting	8	Loc-Y
192.168.1.2	Comp-Y	DNS	5	Loc-Y

For side note (separate issue)
in real data, if I ran "sort" at the end of your JSON query, there are some missing data (compared to no sort or using non JSON query). I am not sure what causing it as it worked just fine in emulated data. Thanks

yuanliu · ‎08-03-2023

Because company can be multivalued after this kind of lookup. So, you need to handle that in mvmap.

index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval company = mvmap(company, if(isnull(company), "N/A", company))
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company

LearningGuy · ‎08-21-2023

Hello,

After lookup command was executed, the row with the blank company were excluded, so the following eval company did not take into effect.

| eval company = mvmap(company, if(isnull(company), "N/A", company))

If I put eval command before the lookup, it did not work because the company field was in the CSV file which took place after the lookup command. The issue is: how do we fill NULL in the CSV file.
Thank you

ITWhisperer · ‎07-25-2023

Try this (although there is no guarantee you won't face memory issues with large data sets regardless of whether you use mvexpand or not):

| makeresults
| eval _raw="ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"
| multikv forceheader=1
| table ip_address vulnerability score
| lookup company.csv ip_address
| eval _rows=mvrange(0,mvcount(company))
| eval company=mvzip(_rows, company, "|")
| eval location=mvzip(_rows, location, "|")
| stats values(*) as * by ip_address vulnerability score _rows
| eval company=mvindex(split(mvindex(company, _rows),"|"),1)
| eval location=mvindex(split(mvindex(location, _rows),"|"),1)

LearningGuy · ‎07-26-2023

Hello,
Thank you for your help. I ran your query on both emulated data and real data and it worked fine.

1. I found duplicates rows and was able to remove duplicate rows with dedup. At first I thought values would remove duplicates. Is there a way to remove duplicates without using dedup?

2. Can I use lookup and the same splitting methods you suggested if I am combining index and DBXquery (instead of CSV), or I have to use "Append/join" with the same method?

Here's the other post - The solution provided was good only if IPs are not duplicate. Thank you again!!
https://community.splunk.com/t5/Splunk-Search/How-to-perform-lookup-from-index-search-with-dbxquery/...

ITWhisperer · ‎07-26-2023

It depends on the source of the duplicates.

lookup is not a generating command whereas dbxquery is (I think), so I suspect you would have to use append (or join).

LearningGuy · ‎07-31-2023

Hello,

If the company is empty, your suggested query (stats value) will not show the company
For example: Comp-A and Comp-Y are removed. The output only 4 rows as shown on Table 2.
I tried to use the following command, but it did not help. Please help. Thank you so much

| fillnull value="N/A" company

| eval company = if(company="" or isnull(company), "N/A",company)

Table1 - company.csv

ip_address	company	location
192.168.1.1		Loc-A
192.168.1.1	Comp-Z	Loc-Z
192.168.1.2	Comp-B	Loc-B
192.168.1.2		Loc-Y
192.168.1.5	Comp-E	Loc-C

Table2
Result after the query:

ip_address vulnerability score company location

192.168.1.1	OpenSSL	7	Comp-Z	Loc-A
192.168.1.1	SQL Injection	9	Comp-Z	Loc-A
192.168.1.2	Cross Site-Scripting	8	Comp-B	Loc-B
192.168.1.2	DNS	5	Comp-B	Loc-B

How to perform lookup in CSV file from index without combining data in one row (and without mvexpand)?

join

lookup

other

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio