How to perform lookup in CSV file from index without combining data in one row (and without mvexpand)?
| index=vulnerability_index
| table ip, vulnerability, score
ip | vulnerability | score |
192.168.1.1 | SQL Injection | 9 |
192.168.1.1 | OpenSSL | 7 |
192.168.1.2 | Cross Site-Scripting | 8 |
192.168.1.2 | DNS | 5 |
CSV file: company.csv
ip_address | company | location |
192.168.1.1 | Comp-A | Loc-A |
192.168.1.2 | Comp-B | Loc-B |
192.168.1.5 | Comp-E | Loc-E |
Lookup in CSV from index
| index=vulnerability_index
| lookup company.csv ip_address as ip OUTPUTNEW ip_address, company, location
The vulnerability and score merged into one row.
ip_address | company | location | vulnerability | score |
192.168.1.1 | Comp-A | Loc-A | SQL Injection OpenSSL | 9 7 |
192.168.1.2 | Comp-B | Loc-B | Cross Site-Scripting DNS | 8 5 |
How do I match the ip_address in a separate row without using mvexpand?
MVExpand has memory limitation and is very slow. It will not work in a large data set and time frame
My expected result is:
ip_address | company | location | vulnerability | Score |
192.168.1.1 | Comp-A | Loc-A | SQL Injection | 9 |
192.168.1.1 | Comp-A | Loc-A | OpenSSL | 7 |
192.168.1.2 | Comp-B | Loc-B | Cross Site-Scripting | 8 |
192.168.1.2 | Comp-B | Loc-B | DNS | 5 |
Thank you for your help
Try this (although there is no guarantee you won't face memory issues with large data sets regardless of whether you use mvexpand or not):
| makeresults
| eval _raw="ip_address vulnerability score
192.168.1.1 SQL Injection 9
192.168.1.1 OpenSSL 7
192.168.1.2 Cross Site-Scripting 8
192.168.1.2 DNS 5"
| multikv forceheader=1
| table ip_address vulnerability score
| lookup company.csv ip_address
| eval _rows=mvrange(0,mvcount(company))
| eval company=mvzip(_rows, company, "|")
| eval location=mvzip(_rows, location, "|")
| stats values(*) as * by ip_address vulnerability score _rows
| eval company=mvindex(split(mvindex(company, _rows),"|"),1)
| eval location=mvindex(split(mvindex(location, _rows),"|"),1)
+1 to what @ITWhisperer said - your search from the index, if it really produces results as you showed in the first table, should not collapse results after simple lookup into multivalued fields as you showed in the last table. So there must be something you're not showing us here.
Hello,
I responded to ITWhisperer. Please take a look.
I appreciate your help.
Thank you
The search you have used would produce results similar to what you want
| makeresults
| fields - _time
| eval _raw="ip vulnerability score
192.168.1.1 SQL Injection 9
192.168.1.1 OpenSSL 7
192.168.1.2 Cross Site-Scripting 8
192.168.1.2 DNS 5"
| multikv forceheader=1
| table ip vulnerability score
| lookup company.csv ip_address as ip OUTPUTNEW ip_address, company, location
If you are getting the results you have indicated, i.e. with the vulnerability and score as multivalue fields, there must be something else in your search doing this which you haven't shared, which is collapsing the events by ip, company or location.
Hello,
Sorry. You are correct. Thank you for your help. I was not aware that lookup from index to CSV will not return duplicate data on the index side. I was confused with the output from "append"
I updated the company.csv showing that one IP can be used in multiple companies, so when lookup is used, it merged data in "company" and "location" field.
1) How do I expand merged data in the company and location fields into individual event without using mvexpand?
2) If I have to use mvexpand, what an efficient way to use it without hitting memory limit?
Thank you so much for your help
| index=vulnerability_index
| table ip, vulnerability, score
ip | vulnerability | score |
192.168.1.1 | SQL Injection | 9 |
192.168.1.1 | OpenSSL | 7 |
192.168.1.2 | Cross Site-Scripting | 8 |
192.168.1.2 | DNS | 5 |
company.csv
ip_address | company | location |
192.168.1.1 | Comp-A | Loc-A |
192.168.1.1 | Comp-Z | Loc-Z |
192.168.1.2 | Comp-B | Loc-B |
192.168.1.2 | Comp-Y | Loc-Y |
192.168.1.5 | Comp-E | Loc-E |
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
ip_address | vulnerability | score | company | location |
192.168.1.1 | SQL Injection | 9 | Comp-A Comp-Z | Loc-A Loc-Z |
192.168.1.1 | OpenSSL | 7 | Comp-A Comp-Z | Loc-A Loc-Z |
192.168.1.2 | Cross Site-Scripting | 8 | Comp-B Comp-Y | Loc-B Loc-Y |
192.168.1.2 | DNS | 5 | Comp-B Comp-Y | Loc-B Loc-Y |
Expected result with query (without mvexpand):
ip_address | vulnerability | score | company | location |
192.168.1.1 | SQL Injection | 9 | Comp-A | Loc-A |
192.168.1.1 | SQL Injection | 9 | Comp-Z | Loc-Z |
192.168.1.1 | OpenSSL | 7 | Comp-A | Loc-A |
192.168.1.1 | OpenSSL | 7 | Comp-Z | Loc-Z |
192.168.1.2 | Cross Site-Scripting | 8 | Comp-B | Loc-B |
192.168.1.2 | Cross Site-Scripting | 8 | Comp-Y | Loc-Y |
192.168.1.2 | DNS | 5 | Comp-B | Loc-B |
192.168.1.2 | DNS | 5 | Comp-Y | Loc-Y |
Similar to your follow-up question in Re: How to perform lookup from index search with d...,, you can use JSON functions if your instance is 8.0 and up.
index = vulnerability_index ```
| table ip_address vulnerability score ```
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx))
| stats values(vulnerability) as vulnerability values(score) as score values(company) as company by ip_address
| stats values(ip_address) as ip_address by vulnerability score company
| eval location = json_extract(company, "location"), company = json_extract(company, "company")
Hello,
Thank you for your help. I ran your query in the emulated data and it gave me 16 results, instead of 8 in the expected result. This topic is Index vs CSV using lookup, the other topic is index vs DBXquery using append. Lookup vs append does not seem to behave the same.
Comp-A only should only have 2 vulnerabilities (1 OpenSSL - with score 7 & 1 SQL Injection - with score - 9)
When I ran it in real data, it gave me merged ip_address. Please suggest. Thank you
vulnerability | score | company | ip_address | location |
OpenSSL | 7 | Comp-A | 192.168.1.1 | Loc-A |
OpenSSL | 7 | Comp-Z | 192.168.1.1 | Loc-Z |
OpenSSL | 9 | Comp-A | 192.168.1.1 | Loc-A |
OpenSSL | 9 | Comp-Z | 192.168.1.1 | Loc-Z |
SQL Injection | 7 | Comp-A | 192.168.1.1 | Loc-A |
SQL Injection | 7 | Comp-Z | 192.168.1.1 | Loc-Z |
SQL Injection | 9 | Comp-A | 192.168.1.1 | Loc-A |
SQL Injection | 9 | Comp-Z | 192.168.1.1 | Loc-Z |
Cross Site-Scripting | 5 | Comp-B | 192.168.1.2 | Loc-B |
Cross Site-Scripting | 5 | Comp-Y | 192.168.1.2 | Loc-Y |
Cross Site-Scripting | 8 | Comp-B | 192.168.1.2 | Loc-B |
Cross Site-Scripting | 8 | Comp-Y | 192.168.1.2 | Loc-Y |
DNS | 5 | Comp-B | 192.168.1.2 | Loc-B |
DNS | 5 | Comp-Y | 192.168.1.2 | Loc-Y |
DNS | 8 | Comp-B | 192.168.1.2 | Loc-B |
DNS | 8 | Comp-Y | 192.168.1.2 | Loc-Y |
My mistake. The difference is not in append vs lookup; it is because I (mis)placed JSON company in values() as opposed to in groupby. Lookup is much simpler to handle, in fact. (And thanks for sharing the output from lookup which I didn't realize was available. It made diagnosis so much easier.)
| index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company
Using the sample lookup output you posted, the results are
ip_address | company | vulnerability | score | location |
192.168.1.1 | Comp-A | OpenSSL | 7 | Loc-A |
192.168.1.1 | Comp-A | SQL Injection | 9 | Loc-A |
192.168.1.1 | Comp-Z | OpenSSL | 7 | Loc-Z |
192.168.1.1 | Comp-Z | SQL Injection | 9 | Loc-Z |
192.168.1.2 | Comp-B | Cross Site-Scripting | 8 | Loc-B |
192.168.1.2 | Comp-B | DNS | 5 | Loc-B |
192.168.1.2 | Comp-Y | Cross Site-Scripting | 8 | Loc-Y |
192.168.1.2 | Comp-Y | DNS | 5 | Loc-Y |
Hello,
I tested your query and it worked on both emulated and real data.
The query time is faster than mvindex suggested by ITWhisperer. I would accept both solutions if I could.
You are correct. This post is using outputlookup while the other post was actually using "append", which "the new issue with duplicate companies" has not been solved
https://community.splunk.com/t5/Splunk-Search/How-to-perform-lookup-from-index-search-with-dbxquery/...
** NULL issue ***
Also, If the company is empty, your suggested query will not show the company
For example: Comp-A and Comp-Y are removed. The output only 4 rows as shown on Table 2.
I tried to use the following command, but it did not help. Please help. Thank you so much
| fillnull value="N/A" company
| eval company = if(company="" or isnull(company), "N/A",company)
Table1 - company.csv
ip_address | company | location |
192.168.1.1 | Loc-A | |
192.168.1.1 | Comp-Z | Loc-Z |
192.168.1.2 | Comp-B | Loc-B |
192.168.1.2 | Loc-Y | |
192.168.1.5 | Comp-E | Loc-C |
Table2
Result after the query:
192.168.1.1 | OpenSSL | 7 | Comp-Z | Loc-A |
192.168.1.1 | SQL Injection | 9 | Comp-Z | Loc-A |
192.168.1.2 | Cross Site-Scripting | 8 | Comp-B | Loc-B |
192.168.1.2 | DNS | 5 | Comp-B | Loc-B |
Either should work. Where do you insert them?
| index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| fillnull value="N/A" company
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company
Hello,
Thank you for looking into this. I put the fillnull after the lookup command. See below:
If you create a new CSV file called company2.csv and replace Comp-A and Comp-Y with "empty" as follows:
ip_address | company | location |
192.168.1.1 |
| Loc-A |
192.168.1.1 | Comp-Z | Loc-Z |
192.168.1.2 | Comp-B | Loc-B |
192.168.1.2 |
| Loc-Y |
192.168.1.5 | Comp-E | Loc-E |
Run the following query using company2.csv and "fillnull":
makeresults
| eval _raw="ip_address vulnerability score
192.168.1.1 SQL Injection 9
192.168.1.1 OpenSSL 7
192.168.1.2 Cross Site-Scripting 8
192.168.1.2 DNS 5"
| multikv forceheader=1
| lookup company2.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| fillnull value="N/A" company
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company
The result using company2.csv is:
ip_address company vulnerability score location
192.168.1.1 | Comp-Z | OpenSSL | 7 | Loc-A |
192.168.1.1 | Comp-Z | SQL Injection | 9 | Loc-A |
192.168.1.2 | Comp-B | Cross Site-Scripting | 8 | Loc-B |
192.168.1.2 | Comp-B | DNS | 5 | Loc-B |
The result using company.csv (without removing Comp-A and Comp-Y) is:
ip_address company vulnerability score location
192.168.1.1 | Comp-A | OpenSSL | 7 | Loc-A |
192.168.1.1 | Comp-A | SQL Injection | 9 | Loc-A |
192.168.1.1 | Comp-Z | OpenSSL | 7 | Loc-Z |
192.168.1.1 | Comp-Z | SQL Injection | 9 | Loc-Z |
192.168.1.2 | Comp-B | Cross Site-Scripting | 8 | Loc-B |
192.168.1.2 | Comp-B | DNS | 5 | Loc-B |
192.168.1.2 | Comp-Y | Cross Site-Scripting | 8 | Loc-Y |
192.168.1.2 | Comp-Y | DNS | 5 | Loc-Y |
For side note (separate issue)
in real data, if I ran "sort" at the end of your JSON query, there are some missing data (compared to no sort or using non JSON query). I am not sure what causing it as it worked just fine in emulated data. Thanks
Because company can be multivalued after this kind of lookup. So, you need to handle that in mvmap.
index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval company = mvmap(company, if(isnull(company), "N/A", company))
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company
Hello,
After lookup command was executed, the row with the blank company were excluded, so the following eval company did not take into effect.
| eval company = mvmap(company, if(isnull(company), "N/A", company))
If I put eval command before the lookup, it did not work because the company field was in the CSV file which took place after the lookup command. The issue is: how do we fill NULL in the CSV file.
Thank you
Try this (although there is no guarantee you won't face memory issues with large data sets regardless of whether you use mvexpand or not):
| makeresults
| eval _raw="ip_address vulnerability score
192.168.1.1 SQL Injection 9
192.168.1.1 OpenSSL 7
192.168.1.2 Cross Site-Scripting 8
192.168.1.2 DNS 5"
| multikv forceheader=1
| table ip_address vulnerability score
| lookup company.csv ip_address
| eval _rows=mvrange(0,mvcount(company))
| eval company=mvzip(_rows, company, "|")
| eval location=mvzip(_rows, location, "|")
| stats values(*) as * by ip_address vulnerability score _rows
| eval company=mvindex(split(mvindex(company, _rows),"|"),1)
| eval location=mvindex(split(mvindex(location, _rows),"|"),1)
Hello,
Thank you for your help. I ran your query on both emulated data and real data and it worked fine.
1. I found duplicates rows and was able to remove duplicate rows with dedup. At first I thought values would remove duplicates. Is there a way to remove duplicates without using dedup?
2. Can I use lookup and the same splitting methods you suggested if I am combining index and DBXquery (instead of CSV), or I have to use "Append/join" with the same method?
Here's the other post - The solution provided was good only if IPs are not duplicate. Thank you again!!
https://community.splunk.com/t5/Splunk-Search/How-to-perform-lookup-from-index-search-with-dbxquery/...
It depends on the source of the duplicates.
lookup is not a generating command whereas dbxquery is (I think), so I suspect you would have to use append (or join).
Hello,
If the company is empty, your suggested query (stats value) will not show the company
For example: Comp-A and Comp-Y are removed. The output only 4 rows as shown on Table 2.
I tried to use the following command, but it did not help. Please help. Thank you so much
| fillnull value="N/A" company
| eval company = if(company="" or isnull(company), "N/A",company)
Table1 - company.csv
ip_address | company | location |
192.168.1.1 | Loc-A | |
192.168.1.1 | Comp-Z | Loc-Z |
192.168.1.2 | Comp-B | Loc-B |
192.168.1.2 | Loc-Y | |
192.168.1.5 | Comp-E | Loc-C |
Table2
Result after the query:
192.168.1.1 | OpenSSL | 7 | Comp-Z | Loc-A |
192.168.1.1 | SQL Injection | 9 | Comp-Z | Loc-A |
192.168.1.2 | Cross Site-Scripting | 8 | Comp-B | Loc-B |
192.168.1.2 | DNS | 5 | Comp-B | Loc-B |