Solved: How to perform lookup from index search with dbxqu...

LearningGuy · ‎07-14-2023

How to perform lookup from index search with dbxquery?

| index=vulnerability_index
| table ip_address, vulnerability, score

ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5

| dbxquery query="select * from tableCompany"

ip_address	company	location
192.168.1.1	Comp-A	Loc-A
192.168.1.2	Comp-B	Loc-B
192.168.1.5	Comp-E	Loc-E

After lookup IP in dbxquery:

ip_address	company	location	vulnerability	Score
192.168.1.1	Comp-A	Loc-A	SQL Injection	9
192.168.1.1	Comp-A	Loc-A	OpenSSL	7
192.168.1.2	Comp-B	Loc-B	Cross Site-Scripting	8
192.168.1.2	Comp-B	Loc-B	DNS	5

Thank you so much

yuanliu · ‎07-19-2023

To put each vulnerability and score pair in a single row, first pair them then split them.

index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```
    | spath input="SubnetID-IP" path={}
    | mvexpand {}
    | spath input={} path=ip_address
    | table Company, Location, ip_address ]
| stats values(Company) as Company values(Location) as Location values(eval(vulnerability. "<:>" . score)) as vulnerability by ip_address
| mvexpand vulnerability
| eval vulnerability = split(vulnerability, "<:>")
| eval score = mvindex(vulnerability, 1), vulnerability = mvindex(vulnerability, 0)

(The reason why I'm using a complex join-split pattern is because vulnerability may contain common separator characters.)

View solution in original post

LearningGuy · ‎07-18-2023

Hello,
Sorry I wasn't clear,
DBXquery has field "SubnetID-IP" , not ip_address

Company	Location	SubnetID-IP
Comp-A	Loc- A	[{"subnet_id":"101","ip_address":"192.168.1.1"},{"subnet_id":"121","ip_address":"192.168.1.21"},{"subnet_id":"131","ip_address":"192.168.1.31"}]
Comp-B	Loc-B	[{"subnet_id":"102","ip_address":"192.168.1.2"},{"subnet_id":"122","ip_address":"192.168.1.22"},{"subnet_id":"123","ip_address":"192.168.1.23"}]
Comp-E	Loc-E	[{"subnet_id":"105","ip_address":"192.168.1.5"},{"subnet_id":"152","ip_address":"192.168.1.52"}]

SubnetID-IP needs to be parsed using
| spath input=SubnetID-IP path={}
| mvexpand {}
| spath input={}
| table Company, Location, ip_address
The output after parsing is below:

company	location	ip_address
Comp-A	Loc- A	192.168.1.1
Comp-B	Loc-B	192.168.1.2
Comp-E	Loc-E	192.168.1.5

How do I combine dbquery and spath and index?
Thanks

yuanliu · ‎07-18-2023

Just add JSON handling inside the subsearch.

index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```
    | spath input="SubnetID-IP" path={}
    | mvexpand {}
    | spath input={} path=ip_address
    | table Company, Location, ip_address ]
| stats values(*) as * by ip_address

This is my emulation

| makeresults
| eval _raw = "ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"

| multikv forceheader=1
``` the above emulates index=vulnerability_index ```
| table ip_address, vulnerability, score
| append
    [| makeresults
    | eval _raw = "Company        	Location       	SubnetID-IP
Comp-A	Loc-A 	 [{\"subnet_id\":\"101\",\"ip_address\":\"192.168.1.1\"},{\"subnet_id\":\"121\",\"ip_address\":\"192.168.1.21\"},{\"subnet_id\":\"131\",\"ip_address\":\"192.168.1.31\"}]
Comp-B	Loc-B	 [{\"subnet_id\":\"102\",\"ip_address\":\"192.168.1.2\"},{\"subnet_id\":\"122\",\"ip_address\":\"192.168.1.22\"},{\"subnet_id\":\"123\",\"ip_address\":\"192.168.1.23\"}]
Comp-E	Loc-E	[{\"subnet_id\":\"105\",\"ip_address\":\"192.168.1.5\"},{\"subnet_id\":\"152\",\"ip_address\":\"192.168.1.52\"}]"
    | multikv forceheader=1
    | rename *________ as *, *_______ as *
    | fields - _* linecount
    ``` the above emulates | dbxquery query="select * from tableCompany" ```
    | spath input="SubnetID_IP" path={} ``` extract changes field name ```
    | mvexpand {}
    | spath input={} path=ip_address
   ]
| stats values(Company) as Company values(Location) as Location values(vulnerability) as vulnerability values(score) as score by ip_address

The result is

ip_address	Company	Location	vulnerability	score
192.168.1.1	Comp-A	Loc-A	OpenSSL SQL Injection	7 9
192.168.1.2	Comp-B	Loc-B	Cross Site-Scripting DNS	5 8
192.168.1.21	Comp-A	Loc-A
192.168.1.22	Comp-B	Loc-B
192.168.1.23	Comp-B	Loc-B
192.168.1.31	Comp-A	Loc-A
192.168.1.5	Comp-E	Loc-E
192.168.1.52	Comp-E	Loc-E

LearningGuy · ‎07-19-2023

Hello,
Thank you for your help. I tried both of your queries, and they worked exactly shown in your result.
However, the result combined all the vulnerabilities in one row (OpenSSL and SQL Injection), and all the scores in one row (7 and 9), and after I expanded into a separate row, the score did not match the vulnerability because it got mixed up when it was merged (OpenSSL should have score 7, not both 7 and 9) .
Please suggest. Thank you
| search vulnerability=* 'remove blank space
| mvexpand vulnerability
| mvexpand score

ip_address	Company	Location	vulnerability	score
192.168.1.1	Comp-A	Loc-A	OpenSSL	7
192.168.1.1	Comp-A	Loc-A	SQL Injection	9
192.168.1.1	Comp-A	Loc-A	OpenSSL	7
192.168.1.1	Comp-A	Loc-A	SQL Injection	9
192.168.1.2	Comp-B	Loc-B	Cross Site-Scripting	5
192.168.1.2	Comp-B	Loc-B	DNS	8
192.168.1.2	Comp-B	Loc-B	Cross Site-Scripting	5
192.168.1.2	Comp-B	Loc-B	DNS	8

yuanliu · ‎07-19-2023

To put each vulnerability and score pair in a single row, first pair them then split them.

index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```
    | spath input="SubnetID-IP" path={}
    | mvexpand {}
    | spath input={} path=ip_address
    | table Company, Location, ip_address ]
| stats values(Company) as Company values(Location) as Location values(eval(vulnerability. "<:>" . score)) as vulnerability by ip_address
| mvexpand vulnerability
| eval vulnerability = split(vulnerability, "<:>")
| eval score = mvindex(vulnerability, 1), vulnerability = mvindex(vulnerability, 0)

(The reason why I'm using a complex join-split pattern is because vulnerability may contain common separator characters.)

LearningGuy · ‎07-21-2023

Hello,
Thanks for your help.
Your suggestion worked fine on the emulated data, it still needs to filter out empty vulnerability ( | search vulnerability=*)
In real data, your query worked with the following findings:
1) Very slow
2) Location field also has "merged" data
3) There are additional fields that need to be merged using ("<:>"), then expand - which may cause more slowness

I will accept your solution in here
Do you have any other alternative solutions?
Can you perhaps give a simple example using external lookup - inside python code?
Thank you so much.

yuanliu · ‎07-22-2023

You can use the job inspector (in the Job menu) to gain some insight about time consumption. But performance tuning with Splunk - or any program, can be complicated. A lot depend on data characteristics. The interaction between dbxquery and an external database makes improvement even trickier.

Without knowing characteristics of dbxquery, I can think of the following

If there are many entries without vulnerabilities, removing them early can reduce the number of wasted row calculations.
Whereas join-mvexpand-split used to be the main weapon in SPL to handle this type of multivalue problems, Splunk 8 introduced JSON functions as a stand-in for hash datatype. I have no insight into how efficient they are, but using split and mvindex on every row must be more expensive.

Combining these two thoughts, you can try the following:

index=vulnerability_index
| eval vulnerability = json_object("vulnerability", vulnerability, "score", score)
``` table at this point is generally a waste ```
| append
    [| dbxquery query="select * from tableCompany" ```select 'SubnetID-IP',company,location from tableCompany```
    | spath input="SubnetID-IP" path={}
    | mvexpand {}
    | spath input={} path=ip_address ]
| stats values(Company) as Company values(Location) as Location values(vulnerability) as vulnerability by ip_address
| stats values(ip_address) as ip_address by Company Location vulnerability
``` this automatically eliminates rows without vulnerability ```
| eval score = json_extract(vulnerability, "score"), vulnerability = json_extract(vulnerability, "vulnerability")
| table ip_address Company Location vulnerability score

You can also reduce number of rows by restricting dbxquery to only SubnetID-IP that the vulnerability_index search returns with a subsearch. How to do this depends on the real index search. (I suspect that you are not returning the entire index in the real job.)

Hope this helps.

LearningGuy · ‎07-24-2023

Hello,
Thanks for your help. I was able to run your JSON query just fine. It turned out that one ip_address can be used in multiple company, so the merged data now is in "vulnerability" , "company", and "location" field
1. Is it possible to use lookup from index to Dbx query?
When I used lookup from index to CSV, it only returns merged data on the CSV side, not on the index side
2. Why don't we use JOIN?

Before parse:

company	location	SubnetID-IP
Comp-A	Loc- A	[{"subnet_id":"101","ip_address":"192.168.1.1"}, {"subnet_id":"121","ip_address":"192.168.1.21"}, {"subnet_id":"131","ip_address":"192.168.1.31"}]
Comp-Z	Loc- Z	[{"subnet_id":"101","ip_address":"192.168.1.1"}, {"subnet_id":"121","ip_address":"192.168.1.21"}, {"subnet_id":"131","ip_address":"192.168.1.31"}]
Comp-B	Loc-B	[{"subnet_id":"102","ip_address":"192.168.1.2"}, {"subnet_id":"122","ip_address":"192.168.1.22"}, {"subnet_id":"123","ip_address":"192.168.1.23"}]
Comp-Y	Loc-Y	[{"subnet_id":"102","ip_address":"192.168.1.2"}, {"subnet_id":"122","ip_address":"192.168.1.22"}, {"subnet_id":"123","ip_address":"192.168.1.23"}]
Comp-E	Loc-E	[{"subnet_id":"105","ip_address":"192.168.1.5"}, {"subnet_id":"152","ip_address":"192.168.1.52"}]

After Parse:

ip_address	company	location
192.168.1.1	Comp-A	Loc-A
192.168.1.1	Comp-Z	Loc-Z
192.168.1.2	Comp-B	Loc-B
192.168.1.2	Comp-Y	Loc-Y
192.168.1.5	Comp-E	Loc-E

After append: both vulnerability and company are merged

ip_address	vulnerability	score	company	location
192.168.1.1	OpenSSL SQL Injection	7	Comp-A Comp-Z	Loc-A Loc-Z
192.168.1.2	Cross Site-Scripting DNS	8	Comp-B Comp-Y	Loc-B Loc-Y

Expected result:

ip_address	vulnerability	score	company	location
192.168.1.1	OpenSSL	7	Comp-A	Loc-A
192.168.1.1	SQL Injection	9	Comp-Z	Loc-A
192.168.1.2	Cross Site-Scripting	8	Comp-B	Loc-B
192.168.1.2	DNS	5	Comp-Y	Loc-B

LearningGuy · ‎07-18-2023

Hello,
I ran your query and it seems like it just appended the row without matching the IP.
This question is related to the other post that you helped me "split pattern into multiple rows"
https://community.splunk.com/t5/Splunk-Search/Split-pattern-into-multiple-rows/m-p/650660

My goal is to replace the table.csv, which was obtained from
| dbxquery connection=visibility query="select Company, Location, SubnetID-IP from tableCompany"
then parsed the IP and SubnetID using spath, and performed mvexpand to split into single event

Instead of running dbxquery and put the data into table.csv to perform a IP lookup from index,
is it possible to run a single query that combine index search and dbxquery to match/lookup an IP address?
Thanks

yuanliu · ‎07-18-2023

First, there was a syntax error copied from the original post. There should not be a leading pipe for index search:

index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```]
| stats values(*) as * by ip_address

Second, can you confirm that both index=vulnerability_index and | dbxquery query="select * from tableCompany" return a field named ip_address, and that some of their values match? In fact, if there is no matching values, no output should exist; if field name ip_address only exists in one search, output should contain only entries from that search. So your described output is very puzzling.

Yes, the above suggested search is intended to run a query directly from Splunk and perform a match. When I suggested running an external query command as external lookup, I mean Create an external lookup, not to run a query and create a static (CSV or KV store) lookup. (On such external lookup is shipped with Splunk Enterprise, dnslookup.) The effect is still the same: a query is performed at the time of the search. But you don't have to perform a stats to make association.

To help you diagnose, I have to emulations, one to emulate the index search and return rows as you posted, the other to emulate | dbxquery and return rows as you posted. The two are combined in the same fashion as the above command:

| makeresults
| eval _raw = "ip_address	vulnerability	score
192.168.1.1	SQL Injection	9
192.168.1.1	OpenSSL	7
192.168.1.2	Cross Site-Scripting	8
192.168.1.2	DNS	5"

| multikv forceheader=1
``` the above emulates index=vulnerability_index ```
| table ip_address, vulnerability, score
| append
    [| makeresults
    | eval _raw = "ip_address	company	location
192.168.1.1	Comp-A	Loc-A
192.168.1.2	Comp-B	Loc-B
192.168.1.5	Comp-E	Loc-E"
    | multikv forceheader=1
    | fields - _* linecount
    ``` the above emulates | dbxquery query="select * from tableCompany" ```
    ]
| stats values(*) as * by ip_address

They give me the output that you desired, as

ip_address	company	location	score	vulnerability
192.168.1.1	Comp-A	Loc-A	7 9	OpenSSL SQL Injection
192.168.1.2	Comp-B	Loc-B	5 8	Cross Site-Scripting DNS
192.168.1.5	Comp-E	Loc-E

You can play with the emulation and compare with real data.

yuanliu · ‎07-14-2023

dbxquery cannot be used as lookup directly. You can do something like

| index=vulnerability_index
| table ip_address, vulnerability, score
| append
    [| dbxquery query="select * from tableCompany" ```select ip_address,company,location from tableCompany```]
| stats values(*) as * by ip_address

But lookup is much more efficient. You should consider set up an external lookup script to lookup from your database. That way, you can use lookup as intended.

| index=vulnerability_index
| table ip_address, vulnerability, score
| lookup lookupCompany ip_address ``` output company,location ```

where lookupCompany is the lookup you set up to perform the SQL query against that database.

How to perform lookup from index search with dbxquery?

join

lookup

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?

Splunk Education Goes to Washington | Splunk GovSummit 2024