Splunk Search

How to perform lookup in CSV file from index without combining data in one row (and without mvexpand)?

LearningGuy
Builder

How to perform lookup in CSV file from index without combining data in one row (and without mvexpand)?

| index=vulnerability_index
| table ip, vulnerability, score

ipvulnerabilityscore
192.168.1.1SQL Injection9
192.168.1.1OpenSSL7
192.168.1.2Cross Site-Scripting8
192.168.1.2DNS5


CSV file:   company.csv

ip_addresscompanylocation
192.168.1.1Comp-ALoc-A
192.168.1.2Comp-BLoc-B
192.168.1.5Comp-ELoc-E


Lookup in CSV from index
| index=vulnerability_index
| lookup company.csv ip_address as ip OUTPUTNEW ip_address, company, location
The vulnerability and score merged into one row.

ip_addresscompanylocationvulnerabilityscore
192.168.1.1Comp-ALoc-ASQL Injection
OpenSSL
9
7
192.168.1.2Comp-BLoc-BCross Site-Scripting
DNS
8
5



How do I match the ip_address in a separate row without using mvexpand?
MVExpand has memory limitation and is very slow. It will not work in a large data set and time frame
My expected result is:

ip_addresscompanylocationvulnerabilityScore
192.168.1.1Comp-ALoc-ASQL Injection9
192.168.1.1Comp-ALoc-AOpenSSL7
192.168.1.2Comp-BLoc-BCross Site-Scripting8
192.168.1.2Comp-BLoc-BDNS5


Thank you for your help

Labels (3)
Tags (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Try this (although there is no guarantee you won't face memory issues with large data sets regardless of whether you use mvexpand or not):

| makeresults
| eval _raw="ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"
| multikv forceheader=1
| table ip_address vulnerability score
| lookup company.csv ip_address
| eval _rows=mvrange(0,mvcount(company))
| eval company=mvzip(_rows, company, "|")
| eval location=mvzip(_rows, location, "|")
| stats values(*) as * by ip_address vulnerability score _rows
| eval company=mvindex(split(mvindex(company, _rows),"|"),1)
| eval location=mvindex(split(mvindex(location, _rows),"|"),1)

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

+1 to what @ITWhisperer said - your search from the index, if it really produces results as you showed in the first table, should not collapse results after simple lookup into multivalued fields as you showed in the last table. So there must be something you're not showing us here.

LearningGuy
Builder

Hello,
I responded to ITWhisperer.  Please take a look.
I appreciate your help.

Thank you

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

The search you have used would produce results similar to what you want

| makeresults
| fields - _time
| eval _raw="ip	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"
| multikv forceheader=1
| table ip vulnerability score
| lookup company.csv ip_address as ip OUTPUTNEW ip_address, company, location

ITWhisperer_0-1690009191276.png

If you are getting the results you have indicated, i.e. with the vulnerability and score as multivalue fields, there must be something else in your search doing this which you haven't shared, which is collapsing the events by ip, company or location.

LearningGuy
Builder

Hello,
Sorry. You are correct. Thank you for your help. I was not aware that lookup from index to CSV will not return duplicate data on the index side. I was confused with the output from "append"
I updated the company.csv showing that one IP can be used in multiple companies, so when lookup is used, it merged data in "company" and "location" field.
1) How do I expand merged data in the company and location fields into individual event without using mvexpand?
2) If I have to use mvexpand, what an efficient way to use it without hitting memory limit?

Thank you so much for your help

| index=vulnerability_index
| table ip, vulnerability, score

ipvulnerabilityscore
192.168.1.1SQL Injection9
192.168.1.1OpenSSL7
192.168.1.2Cross Site-Scripting8
192.168.1.2DNS5


company.csv

ip_address

company

location

192.168.1.1

Comp-A

Loc-A

192.168.1.1

Comp-Z

Loc-Z

192.168.1.2

Comp-B

Loc-B

192.168.1.2

Comp-Y

Loc-Y

192.168.1.5

Comp-E

Loc-E


| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location

ip_addressvulnerabilityscorecompanylocation
192.168.1.1SQL Injection9Comp-A
Comp-Z
Loc-A
Loc-Z
192.168.1.1OpenSSL7Comp-A
Comp-Z
Loc-A
Loc-Z
192.168.1.2Cross Site-Scripting8Comp-B
Comp-Y
Loc-B
Loc-Y
192.168.1.2DNS5Comp-B
Comp-Y
Loc-B
Loc-Y


Expected result with query (without mvexpand):

ip_addressvulnerabilityscorecompanylocation
192.168.1.1SQL Injection9Comp-ALoc-A
192.168.1.1SQL Injection9Comp-ZLoc-Z
192.168.1.1OpenSSL7Comp-ALoc-A
192.168.1.1OpenSSL7Comp-ZLoc-Z
192.168.1.2Cross Site-Scripting8Comp-BLoc-B
192.168.1.2Cross Site-Scripting8Comp-YLoc-Y
192.168.1.2DNS5Comp-BLoc-B
192.168.1.2DNS5Comp-YLoc-Y
Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Similar to your follow-up question in Re: How to perform lookup from index search with d...,, you can use JSON functions if your instance is 8.0 and up.

index = vulnerability_index ``` 
| table ip_address vulnerability score ```
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx))
| stats values(vulnerability) as vulnerability values(score) as score values(company) as company by ip_address
| stats values(ip_address) as ip_address by vulnerability score company
| eval location = json_extract(company, "location"), company = json_extract(company, "company")
Tags (1)

LearningGuy
Builder

Hello,
Thank you for your help. I ran your query in the emulated data and it gave me 16 results, instead of 8 in the expected result.  This topic is Index vs CSV using lookup,  the other topic is index vs DBXquery using append.  Lookup vs append does not seem to behave the same. 
Comp-A only should only have 2 vulnerabilities (1 OpenSSL - with score 7 & 1 SQL Injection - with score - 9)
When I ran it in real data, it gave me merged ip_address.  Please suggest. Thank you

vulnerabilityscorecompanyip_addresslocation
OpenSSL7Comp-A192.168.1.1Loc-A
OpenSSL7Comp-Z192.168.1.1Loc-Z
OpenSSL9Comp-A192.168.1.1Loc-A
OpenSSL9Comp-Z192.168.1.1Loc-Z
SQL Injection7Comp-A192.168.1.1Loc-A
SQL Injection7Comp-Z192.168.1.1Loc-Z
SQL Injection9Comp-A192.168.1.1Loc-A
SQL Injection9Comp-Z192.168.1.1Loc-Z
Cross Site-Scripting5Comp-B192.168.1.2Loc-B
Cross Site-Scripting5Comp-Y192.168.1.2Loc-Y
Cross Site-Scripting8Comp-B192.168.1.2Loc-B
Cross Site-Scripting8Comp-Y192.168.1.2Loc-Y
DNS5Comp-B192.168.1.2Loc-B
DNS5Comp-Y192.168.1.2Loc-Y
DNS8Comp-B192.168.1.2Loc-B
DNS8Comp-Y192.168.1.2Loc-Y
0 Karma

yuanliu
SplunkTrust
SplunkTrust

My mistake.  The difference is not in append vs lookup; it is because I (mis)placed JSON company in values() as opposed to in groupby.  Lookup is much simpler to handle, in fact. (And thanks for sharing the output from lookup which I didn't realize was available.  It made diagnosis so much easier.)

 

 

| index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company

 

 

Using the sample lookup output you posted, the results are

ip_addresscompanyvulnerabilityscorelocation
192.168.1.1Comp-AOpenSSL7Loc-A
192.168.1.1Comp-ASQL Injection9Loc-A
192.168.1.1Comp-ZOpenSSL7Loc-Z
192.168.1.1Comp-ZSQL Injection9Loc-Z
192.168.1.2Comp-BCross Site-Scripting8Loc-B
192.168.1.2Comp-BDNS5Loc-B
192.168.1.2Comp-YCross Site-Scripting8Loc-Y
192.168.1.2Comp-YDNS5Loc-Y

 

LearningGuy
Builder

Hello,
I tested your query and it worked on both emulated and real data.
The query time is faster than mvindex suggested by ITWhisperer. I would accept both solutions if I could.
You are correct. This post is using outputlookup while the other post was actually using "append", which "the new issue with duplicate companies" has not been solved 
https://community.splunk.com/t5/Splunk-Search/How-to-perform-lookup-from-index-search-with-dbxquery/...

** NULL issue ***
Also, If the company is empty,  your suggested query will not show the company
For example:  Comp-A and Comp-Y are removed.  The output only 4 rows as shown on Table 2.
I tried to use the following command, but it did not help.   Please help.  Thank you so much

 

| fillnull value="N/A" company
| eval company = if(company="" or isnull(company), "N/A",company)

 

Table1 - company.csv   

ip_address

company

location

192.168.1.1

 

Loc-A

192.168.1.1

Comp-Z

Loc-Z

192.168.1.2

Comp-B

Loc-B

192.168.1.2

 

Loc-Y

192.168.1.5

Comp-E

Loc-C


Table2
Result after the query:

ip_address vulnerability score company location
192.168.1.1OpenSSL7Comp-ZLoc-A
192.168.1.1SQL Injection9Comp-ZLoc-A
192.168.1.2Cross Site-Scripting8Comp-BLoc-B
192.168.1.2DNS5Comp-BLoc-B
Tags (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Either should work.  Where do you insert them?

| index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| fillnull value="N/A" company
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company

LearningGuy
Builder

Hello,
Thank you for looking into this. I put the fillnull after the lookup command. See below:
If you create a new CSV file called company2.csv and replace Comp-A and Comp-Y with "empty" as follows:

ip_address

company

location

192.168.1.1

 

Loc-A

192.168.1.1

Comp-Z

Loc-Z

192.168.1.2

Comp-B

Loc-B

192.168.1.2

 

Loc-Y

192.168.1.5

Comp-E

Loc-E


Run the following query using company2.csv and "fillnull":

makeresults
| eval _raw="ip_address vulnerability score
192.168.1.1 SQL Injection 9
192.168.1.1 OpenSSL 7
192.168.1.2 Cross Site-Scripting 8
192.168.1.2 DNS 5"
| multikv forceheader=1
| lookup company2.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| fillnull value="N/A" company
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company

The result using company2.csv is:

ip_address company vulnerability score location

192.168.1.1Comp-ZOpenSSL7Loc-A
192.168.1.1Comp-ZSQL Injection9Loc-A
192.168.1.2Comp-BCross Site-Scripting8Loc-B
192.168.1.2Comp-BDNS5Loc-B


The result using company.csv (without removing Comp-A and Comp-Y) is:

ip_address company vulnerability score location

192.168.1.1Comp-AOpenSSL7Loc-A
192.168.1.1Comp-ASQL Injection9Loc-A
192.168.1.1Comp-ZOpenSSL7Loc-Z
192.168.1.1Comp-ZSQL Injection9Loc-Z
192.168.1.2Comp-BCross Site-Scripting8Loc-B
192.168.1.2Comp-BDNS5Loc-B
192.168.1.2Comp-YCross Site-Scripting8Loc-Y
192.168.1.2Comp-YDNS5Loc-Y


For side note (separate issue)
in real data, if I ran "sort" at the end of your JSON query, there are some missing data (compared to no sort or using non JSON query). I am not sure what causing it as it worked just fine in emulated data.   Thanks

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Because company can be multivalued after this kind of lookup.  So, you need to handle that in mvmap.

index=vulnerability_index
| lookup company.csv ip_address as ip_address OUTPUTNEW ip_address, company, location
| eval company = mvmap(company, if(isnull(company), "N/A", company))
| eval idx = mvrange(0, mvcount(company))
| eval company = mvmap(idx, json_object("company", mvindex(company, idx), "location", mvindex(location, idx)))
| stats values(score) as score by ip_address company vulnerability
| fromjson company

 

LearningGuy
Builder

Hello,

After lookup command was executed, the row with the blank company were excluded, so the following eval company did not take into effect.

| eval company = mvmap(company, if(isnull(company), "N/A", company))

If I put eval command before the lookup, it did not work because the company field was in the CSV file which took place after the lookup command.  The issue is: how do we fill NULL in the CSV file.
Thank you



0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try this (although there is no guarantee you won't face memory issues with large data sets regardless of whether you use mvexpand or not):

| makeresults
| eval _raw="ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"
| multikv forceheader=1
| table ip_address vulnerability score
| lookup company.csv ip_address
| eval _rows=mvrange(0,mvcount(company))
| eval company=mvzip(_rows, company, "|")
| eval location=mvzip(_rows, location, "|")
| stats values(*) as * by ip_address vulnerability score _rows
| eval company=mvindex(split(mvindex(company, _rows),"|"),1)
| eval location=mvindex(split(mvindex(location, _rows),"|"),1)

LearningGuy
Builder

Hello,
Thank you for your help. I ran your query on both emulated data and real data and it worked fine.

1. I found duplicates rows and was able to remove duplicate rows with dedup. At first I thought values would remove duplicates.   Is there a way to remove duplicates without using dedup?

2. Can I use lookup and the same splitting methods you suggested if I am combining index and DBXquery (instead of CSV), or I have to use "Append/join" with the same method?

Here's the other post - The solution provided was good only if IPs are not duplicate.  Thank you again!!
https://community.splunk.com/t5/Splunk-Search/How-to-perform-lookup-from-index-search-with-dbxquery/...

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It depends on the source of the duplicates.

lookup is not a generating command whereas dbxquery is (I think), so I suspect you would have to use append (or join).

LearningGuy
Builder

Hello,

If the company is empty,  your suggested query (stats value)  will not show the company
For example:  Comp-A and Comp-Y are removed.  The output only 4 rows as shown on Table 2.
I tried to use the following command, but it did not help.   Please help.  Thank you so much

 

| fillnull value="N/A" company
| eval company = if(company="" or isnull(company), "N/A",company)

 

Table1 - company.csv   

ip_address

company

location

192.168.1.1

 

Loc-A

192.168.1.1

Comp-Z

Loc-Z

192.168.1.2

Comp-B

Loc-B

192.168.1.2

 

Loc-Y

192.168.1.5

Comp-E

Loc-C


Table2
Result after the query:

 
ip_address vulnerability score company location
192.168.1.1OpenSSL7Comp-ZLoc-A
192.168.1.1SQL Injection9Comp-ZLoc-A
192.168.1.2Cross Site-Scripting8Comp-BLoc-B
192.168.1.2DNS5Comp-BLoc-B
Tags (2)
0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...