Splunk Search

Is there a better way to join two sub-searches and 2 lookup-tables to main search?

ccloutralex
Observer

So lets say i have three searches i need to join data from:

Main search (search_int) has the following fields:

Computer_name
ip_address
data_to_be_joined_1
data_to_be_joined_2

subsearch (search_vul)
data_to_be_joined_1
data_to_be_extracted_after_join_on_1
another_data_to_be_extracted_after_join_on_1

subsearch (search_realname)
computer_name ( needs to be joined with mainsearch computer_name)
ipv4 (needs to be joined with mainsearch ip_address)
other_name_to_be_extracted

Now I would like to join this data and have a table that reads other_name_to_be_extracted computer_name Ip_address data_to_be_joined_1 data_to_be_extracted_after_join_on_1 another_data_to_be_extracted_after_join_on_1.

Now moving forward i will need to do some math based on the sum of (data_to_be_extracted_after_join_on_1 another_data_to_be_extracted_after_join_on_1) and display the totals as new column BY other_name_to_be_extracted.
My current search is below. I am having issues with it missing data. this is because the results are > 360k. other than that it works fine. Is there a better way to do this? I know I can use stats but I am unsure how to do something this complex...

index=v sourcetype=json:search_int AND (payload="*X*") status!="Rem" 
 | dedup vul,computer_name 
 | eval vul=lower(vul)
 | eval data_to_be_joined_1=lower(data_to_be_joined_1)
 | rename vul as Vul,sev as Sev
 | eval computer_name=lower(computer_name)  
 | eval ip_address = secondaryid   
 | join data_to_be_joined_1 type=left
 [search index=vul sourcetype="json:search_vul" indexr="v2" description!="** REJECT **" | eval data_to_be_joined_1x=lower(data_to_be_joined_1x)
 | rename data_to_be_joined_1x as data_to_be_joined_1]
 | join computer_name,ip_address 
 [search index=vul sourcetype=json:map name!="None" indexr="S" 
| eval computer_name=lower(computername) 
| eval ip_address = ipv4]
| eval Sev=upper(Sev)
| lookup xxx.csv base_sev as Sev OUTPUT avg(base) as BS2, avg(imp) as IS2
| eval cadjusted=case((base=="Unknown" OR base==""),BS2,true(),base) 
| eval impadjusted=case((imp=="Unknown" OR imp=="") ,IS2,true(),imp) 
| eventstats sum(imp) as impt by name
| eventstats sum(cadjusted) as ctotal by name
| eventstats sum(cvss_adjusted) as allcvsstotal
| eventstats sum(impadjusted) as allimpacttotal
| eval T1 = cvtotal + imptotal
| eval T2 = allctotal + allimptotal
| eval Tp3 = ctotal / allctotal
| eval Tp4 = imptotal / allimptotal
| eval TPA5 = (T1 / T2) * 100
| eventstats avg(cadjusted) as cAvg
| dedup name
| table name cotal Tp3 cAvg imptotal Tp4 T1 allctotal allimpacttotal T2 TPA5
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi ccloutralex,
if you read the most answers about join, you find that join is a command to use only when it isn't possible to use a different approach because has two problems:

  • it's a slow command,
  • there the limit of 50,000 results in subsearches.

So at first check the number of results in subsearches.
Then try to group at least two searches: the main end one of the others, the one with more results, using stats, something like this:

(index=v sourcetype=json:search_int AND (payload="*X*") status!="Rem") OR (index=vul sourcetype="json:search_vul" indexr="v2" description!="** REJECT **")
| eval vul=lower(vul), data_to_be_joined_1=lower(data_to_be_joined_1), computer_name=lower(computer_name), data_to_be_joined_1=lower(data_to_be_joined_1x) 
| rename vul AS Vul sev AS Sev secondaryid AS ip_address
| stats values(Vul) AS vul values(Sev) AS Sev values(computer_name) AS computer_name values(ip_address) AS ip_address BY data_to_be_joined_1 
| join computer_name,ip_address 
  [search index=vul sourcetype=json:map name!="None" indexr="S" 
 | eval computer_name=lower(computername) 
 | eval ip_address = ipv4]
 | eval Sev=upper(Sev)
 | lookup xxx.csv base_sev as Sev OUTPUT avg(base) as BS2, avg(imp) as IS2
 | eval cadjusted=case((base=="Unknown" OR base=""),BS2,true(),base) 
 | eval impadjusted=case((imp=="Unknown" OR imp="") ,IS2,true(),imp) 
 | eventstats sum(imp) as impt by name
 | eventstats sum(cadjusted) as ctotal by name
 | eventstats sum(cvss_adjusted) as allcvsstotal
 | eventstats sum(impadjusted) as allimpacttotal
 | eval T1 = cvtotal + imptotal
 | eval T2 = allctotal + allimptotal
 | eval Tp3 = ctotal / allctotal
 | eval Tp4 = imptotal / allimptotal
 | eval TPA5 = (T1 / T2) * 100
 | eventstats avg(cadjusted) as cAvg
 | dedup name
 | table name cotal Tp3 cAvg imptotal Tp4 T1 allctotal allimpacttotal T2 TPA5

Ciao.
Giuseppe

0 Karma

ccloutralex
Observer

I appreciate your response! Unfortunately that search does not work. Lets make it a bit more simple. see below:

I have two sourcetypes:

(index=vulnerability sourcetype=json:id) with the following fields:
computername
secondaryid
id

(sourcetype="json:impacts") with the following fields:
c_id
cw_id
bs
is

I am having trouble with the limits on the 'join' command so i need to use stats or something of the sort to 'join' this data. The two common fields in the source types that need to be joined are 'id' from the first one and 'c_id' from the second. The end result will show a table with the computername,secondaryid,(common id/c_id) with the corresponding cw_id,bs and is fields.

My current search is below but it does not work. it only shows computername,ipv4,id and then the bs,is and cw_id fields are blank, but when looking through pages the bs,is and cw_id fields ARE displayed by not with their corresponding computername,ipv4 and id.

(index=vulnerability sourcetype=json:cve) OR (sourcetype="json:impacts") 
| eval computername=lower(computername),id=lower(id),c_id=lower(c_id),cw_id=lower(cw_id)
| rename c_id as id 
| stats values(secondaryid) as ipv4 values(bs) as bs values(is) as is values(id) as id values(cw_id) as cwid by computername
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi ccloutralex,
the join key is "id" npot computername, so this is the value you have to use in the BY clause to join results, computername is present only in the first search, try something like this:

 (index=vulnerability sourcetype=json:cve) OR (sourcetype="json:impacts") 
 | eval computername=lower(computername),id=lower(id),c_id=lower(c_id),cw_id=lower(cw_id)
 | rename c_id as id 
 | stats values(secondaryid) as ipv4 values(bs) as bs values(is) as is values(computername) as computername values(cw_id) as cwid by id

Ciao.
Giuseppe

0 Karma

ccloutralex
Observer

When i do it this way it only shows me id,bs,is,cwid but not computer_name or secondaryid. I dont know if this is causing an issue but there could be multiple computer_name/secondaryid combos with the same id.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...