Splunk Search

How to perform lookup from index search with dbxquery?

LearningGuy
Builder

How to perform lookup from index search with dbxquery?

| index=vulnerability_index
| table ip_address, vulnerability, score

ip_addressvulnerabilityscore
192.168.1.1SQL Injection9
192.168.1.1OpenSSL7
192.168.1.2Cross Site-Scripting8
192.168.1.2DNS5


| dbxquery query="select * from tableCompany"

ip_addresscompanylocation
192.168.1.1Comp-ALoc-A
192.168.1.2Comp-BLoc-B
192.168.1.5Comp-ELoc-E


 After lookup IP in dbxquery:

ip_addresscompanylocationvulnerabilityScore
192.168.1.1Comp-ALoc-ASQL Injection9
192.168.1.1Comp-ALoc-AOpenSSL7
192.168.1.2Comp-BLoc-BCross Site-Scripting8
192.168.1.2Comp-BLoc-BDNS5


Thank you so much

Labels (2)
Tags (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

To put each vulnerability and score pair in a single row, first pair them then split them.

index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```
    | spath input="SubnetID-IP" path={}
    | mvexpand {}
    | spath input={} path=ip_address
    | table Company, Location, ip_address ]
| stats values(Company) as Company values(Location) as Location values(eval(vulnerability. "<:>" . score)) as vulnerability by ip_address
| mvexpand vulnerability
| eval vulnerability = split(vulnerability, "<:>")
| eval score = mvindex(vulnerability, 1), vulnerability = mvindex(vulnerability, 0)

(The reason why I'm using a complex join-split pattern is because vulnerability may contain common separator characters.)

View solution in original post

LearningGuy
Builder

Hello,
Sorry I wasn't clear,
DBXquery has field "SubnetID-IP" , not ip_address

Company        Location       SubnetID-IP
Comp-ALoc- A  [{"subnet_id":"101","ip_address":"192.168.1.1"},{"subnet_id":"121","ip_address":"192.168.1.21"},{"subnet_id":"131","ip_address":"192.168.1.31"}]
Comp-BLoc-B [{"subnet_id":"102","ip_address":"192.168.1.2"},{"subnet_id":"122","ip_address":"192.168.1.22"},{"subnet_id":"123","ip_address":"192.168.1.23"}]
Comp-ELoc-E[{"subnet_id":"105","ip_address":"192.168.1.5"},{"subnet_id":"152","ip_address":"192.168.1.52"}]


SubnetID-IP needs to be parsed using
| spath input=SubnetID-IP path={}
| mvexpand {}
| spath input={}
| table Company, Location, ip_address
The output after parsing is below:    

company        location       ip_address
Comp-ALoc- A 192.168.1.1
Comp-BLoc-B192.168.1.2
Comp-ELoc-E192.168.1.5

How do I combine dbquery and spath and index? 
Thanks

0 Karma

yuanliu
SplunkTrust
SplunkTrust

 

Just add JSON handling inside the subsearch.

 

index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```
    | spath input="SubnetID-IP" path={}
    | mvexpand {}
    | spath input={} path=ip_address
    | table Company, Location, ip_address ]
| stats values(*) as * by ip_address

 

This is my emulation

 

| makeresults
| eval _raw = "ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"

| multikv forceheader=1
``` the above emulates index=vulnerability_index ```
| table ip_address, vulnerability, score
| append
    [| makeresults
    | eval _raw = "Company        	Location       	SubnetID-IP
Comp-A	Loc-A 	 [{\"subnet_id\":\"101\",\"ip_address\":\"192.168.1.1\"},{\"subnet_id\":\"121\",\"ip_address\":\"192.168.1.21\"},{\"subnet_id\":\"131\",\"ip_address\":\"192.168.1.31\"}]
Comp-B	Loc-B	 [{\"subnet_id\":\"102\",\"ip_address\":\"192.168.1.2\"},{\"subnet_id\":\"122\",\"ip_address\":\"192.168.1.22\"},{\"subnet_id\":\"123\",\"ip_address\":\"192.168.1.23\"}]
Comp-E	Loc-E	[{\"subnet_id\":\"105\",\"ip_address\":\"192.168.1.5\"},{\"subnet_id\":\"152\",\"ip_address\":\"192.168.1.52\"}]"
    | multikv forceheader=1
    | rename *________ as *, *_______ as *
    | fields - _* linecount
    ``` the above emulates | dbxquery query="select * from tableCompany" ```
    | spath input="SubnetID_IP" path={} ``` extract changes field name ```
    | mvexpand {}
    | spath input={} path=ip_address
   ]
| stats values(Company) as Company values(Location) as Location values(vulnerability) as vulnerability values(score) as score by ip_address

 

The result is

ip_addressCompanyLocation
vulnerability
score
192.168.1.1Comp-ALoc-A
OpenSSL
SQL Injection
7
9
192.168.1.2Comp-BLoc-B
Cross Site-Scripting
DNS
5
8
192.168.1.21Comp-ALoc-A  
192.168.1.22Comp-BLoc-B  
192.168.1.23Comp-BLoc-B  
192.168.1.31Comp-ALoc-A  
192.168.1.5Comp-ELoc-E  
192.168.1.52Comp-ELoc-E  

 

LearningGuy
Builder

Hello,
Thank you for your help. I tried both of your queries, and they worked exactly shown in your result.
However, the result combined all the vulnerabilities in one row (OpenSSL and SQL Injection), and all the scores in one row (7 and 9), and after I expanded into a separate row, the score did not match the vulnerability because it got mixed up when it was merged (OpenSSL should have score 7, not both 7 and 9) . 
Please suggest. Thank you
| search vulnerability=*       'remove blank space
| mvexpand vulnerability
| mvexpand score

ip_addressCompanyLocation
vulnerability
score
192.168.1.1Comp-ALoc-A
OpenSSL
7
192.168.1.1Comp-ALoc-A
SQL Injection
9
192.168.1.1Comp-ALoc-A
OpenSSL
7
192.168.1.1Comp-ALoc-A
SQL Injection
9
192.168.1.2Comp-BLoc-B
Cross Site-Scripting
5
 
192.168.1.2Comp-BLoc-BDNS8
192.168.1.2Comp-BLoc-B
Cross Site-Scripting
5
 
192.168.1.2Comp-BLoc-BDNS8
0 Karma

yuanliu
SplunkTrust
SplunkTrust

To put each vulnerability and score pair in a single row, first pair them then split them.

index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```
    | spath input="SubnetID-IP" path={}
    | mvexpand {}
    | spath input={} path=ip_address
    | table Company, Location, ip_address ]
| stats values(Company) as Company values(Location) as Location values(eval(vulnerability. "<:>" . score)) as vulnerability by ip_address
| mvexpand vulnerability
| eval vulnerability = split(vulnerability, "<:>")
| eval score = mvindex(vulnerability, 1), vulnerability = mvindex(vulnerability, 0)

(The reason why I'm using a complex join-split pattern is because vulnerability may contain common separator characters.)

LearningGuy
Builder

Hello,
Thanks for your help. 
Your suggestion worked fine on the emulated data, it still needs to filter out empty vulnerability (  | search vulnerability=*)
In real data, your query worked with the following findings:
1) Very slow
2) Location field also has "merged" data
3) There are additional fields that need to be merged using ("<:>"), then expand - which may cause more slowness

I will accept your solution in here
Do you have any other alternative solutions?
Can you perhaps give a simple example using external lookup - inside python code?
Thank you so much.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You can use the job inspector (in the Job menu) to gain some insight about time consumption.  But performance tuning with Splunk - or any program, can be complicated.  A lot depend on data characteristics.  The interaction between dbxquery and an external database makes improvement even trickier.

Without knowing characteristics of dbxquery, I can think of the following

  • If there are many entries without vulnerabilities, removing them early can reduce the number of wasted row calculations.
  • Whereas join-mvexpand-split used to be the main weapon in SPL to handle this type of multivalue problems, Splunk 8 introduced JSON functions as a stand-in for hash datatype.  I have no insight into how efficient they are, but using split and mvindex on every row must be more expensive.

Combining these two thoughts, you can try the following:

 

index=vulnerability_index
| eval vulnerability = json_object("vulnerability", vulnerability, "score", score)
``` table at this point is generally a waste ```
| append
    [| dbxquery query="select * from tableCompany" ```select 'SubnetID-IP',company,location from tableCompany```
    | spath input="SubnetID-IP" path={}
    | mvexpand {}
    | spath input={} path=ip_address ]
| stats values(Company) as Company values(Location) as Location values(vulnerability) as vulnerability by ip_address
| stats values(ip_address) as ip_address by Company Location vulnerability
``` this automatically eliminates rows without vulnerability ```
| eval score = json_extract(vulnerability, "score"), vulnerability = json_extract(vulnerability, "vulnerability")
| table ip_address Company Location vulnerability score

 

You can also reduce number of rows by restricting dbxquery to only SubnetID-IP that the vulnerability_index search returns with a subsearch.  How to do this depends on the real index search. (I suspect that you are not returning the entire index in the real job.)

Hope this helps.

LearningGuy
Builder

Hello,
Thanks for your help. I was able to run your JSON query just fine. It turned out that one ip_address can be used in multiple company, so the merged data now is in "vulnerability" , "company", and "location" field
1. Is it possible to use lookup from index to Dbx query?  
     When I used lookup from index to CSV, it only returns merged data on the CSV side, not on the index side
2. Why don't we use JOIN?

Before parse:

company        

location       

SubnetID-IP

Comp-A

Loc- A 

 [{"subnet_id":"101","ip_address":"192.168.1.1"},
{"subnet_id":"121","ip_address":"192.168.1.21"},
{"subnet_id":"131","ip_address":"192.168.1.31"}]

Comp-Z

Loc- Z

 [{"subnet_id":"101","ip_address":"192.168.1.1"},
{"subnet_id":"121","ip_address":"192.168.1.21"},
{"subnet_id":"131","ip_address":"192.168.1.31"}]

Comp-B

Loc-B

 [{"subnet_id":"102","ip_address":"192.168.1.2"},
{"subnet_id":"122","ip_address":"192.168.1.22"},
{"subnet_id":"123","ip_address":"192.168.1.23"}]

Comp-Y

Loc-Y

 [{"subnet_id":"102","ip_address":"192.168.1.2"},
{"subnet_id":"122","ip_address":"192.168.1.22"},
{"subnet_id":"123","ip_address":"192.168.1.23"}]

Comp-E

Loc-E

[{"subnet_id":"105","ip_address":"192.168.1.5"},
{"subnet_id":"152","ip_address":"192.168.1.52"}]


After Parse:  

ip_address

company

location

192.168.1.1

Comp-A

Loc-A

192.168.1.1

Comp-Z

Loc-Z

192.168.1.2

Comp-B

Loc-B

192.168.1.2

Comp-Y

Loc-Y

192.168.1.5

Comp-E

Loc-E


After append:  both vulnerability and company are merged

ip_address

vulnerability

score

company

location

192.168.1.1

OpenSSL
SQL Injection

7

Comp-A
Comp-Z

Loc-A
Loc-Z

192.168.1.2

Cross Site-Scripting
DNS

8

Comp-B
Comp-Y

Loc-B
Loc-Y

 

Expected result:

ip_address

vulnerability

score

company

location

192.168.1.1

OpenSSL

7

Comp-A

Loc-A

192.168.1.1

SQL Injection

9

Comp-Z

Loc-A

192.168.1.2

Cross Site-Scripting

8

Comp-B

Loc-B

192.168.1.2

DNS

5

Comp-Y

Loc-B

Tags (2)
0 Karma

LearningGuy
Builder

Hello, 
I ran your query and it seems like it just appended the row without matching the IP. 
This question is related to the other post that you helped me "split pattern into multiple rows"
https://community.splunk.com/t5/Splunk-Search/Split-pattern-into-multiple-rows/m-p/650660

My goal is to replace the table.csv, which was obtained from
| dbxquery connection=visibility query="select Company, Location, SubnetID-IP from tableCompany"
then parsed the IP and SubnetID using spath, and performed mvexpand to split into single event 

Instead of running dbxquery and put the data into table.csv to perform a IP lookup from index,
is it possible to run a single query that combine index search and dbxquery to match/lookup an IP address?
Thanks

0 Karma

yuanliu
SplunkTrust
SplunkTrust

First, there was a syntax error copied from the original post.  There should not be a leading pipe for index search:

 

index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```]
| stats values(*) as * by ip_address

 

Second, can you confirm that both index=vulnerability_index and | dbxquery query="select * from tableCompany" return a field named ip_address, and that some of their values match?  In fact, if there is no matching values, no output should exist; if field name ip_address only exists in one search, output should contain only entries from that search.  So your described output is very puzzling.

Yes, the above suggested search is intended to run a query directly from Splunk and perform a match.  When I suggested running an external query command as external lookup, I mean Create an external lookup, not to run a query and create a static (CSV or KV store) lookup. (On such external lookup is shipped with Splunk Enterprise, dnslookup.) The effect is still the same: a query is performed at the time of the search.  But you don't have to perform a stats to make association.

To help you diagnose, I have to emulations, one to emulate the index search and return rows as you posted, the other to emulate | dbxquery and return rows as you posted.  The two are combined in the same fashion as the above command:

 

| makeresults
| eval _raw = "ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"

| multikv forceheader=1
``` the above emulates index=vulnerability_index ```
| table ip_address, vulnerability, score
| append
    [| makeresults
    | eval _raw = "ip_address	company	location
192.168.1.1	Comp-A	Loc-A
192.168.1.2	Comp-B	Loc-B
192.168.1.5	Comp-E	Loc-E"
    | multikv forceheader=1
    | fields - _* linecount
    ``` the above emulates | dbxquery query="select * from tableCompany" ```
    ]
| stats values(*) as * by ip_address

 

They give me the output that you desired, as

ip_addresscompanylocation
score
vulnerability
192.168.1.1Comp-ALoc-A
7
9
OpenSSL
SQL Injection
192.168.1.2Comp-BLoc-B
5
8
Cross Site-Scripting
DNS
192.168.1.5Comp-ELoc-E  

You can play with the emulation and compare with real data.

yuanliu
SplunkTrust
SplunkTrust

dbxquery cannot be used as lookup directly.  You can do something like

| index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```]
| stats values(*) as * by ip_address

But lookup is much more efficient.  You should consider set up an external lookup script to lookup from your database.  That way, you can use lookup as intended.

| index=vulnerability_index
| table ip_address, vulnerability, score
| lookup lookupCompany ip_address ``` output company,location ```

where lookupCompany is the lookup you set up to perform the SQL query against that database.

Tags (1)
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...