Splunk Search

Help with converting JOIN to STATS/EVENTSTATS

tlmayes
Contributor

I started with the following query, required to join a knowledge library with discovered hosts. The results are stored in a summary index for quick(er) retrieval.  The first query is usually less than 100 events, the second is 70,000+ every time, but the whole thing runs in less than 60s.  The problem is the 50,000 JOIN subsearch limitation.

 

 

 

index=qualys sourcetype="qualys:hostDetection" QID=* SEVERITY IN (3 4 5) LAST_SCAN_DATETIME=*  earliest=-15min latest=now()
| join type=inner QID [search index=qualys sourcetype="qualys:knowledgebase" QID=* earliest=1 latest=now() 
| dedup QID | table QID THREAT_INTEL_VALUES CVSS_V3_BASE PUBLISHED_DATETIME THREAT_INTEL_IDS VENDOR_REFERENCE TITLE] 
|table _time IP DNS NETBIOS TRACKING_METHOD OS TAGS QID TITLE TYPE SEVERITY STATUS LAST_SCAN_DATETIME LAST_FOUND_DATETIME LAST_FIXED_DATETIME PUBLISHED_DATETIME THREAT_INTEL_VALUES THREAT_INTEL_IDS CVSS_V3_BASE VENDOR_REFERENCE RESULTS

 

 

 


In order to overcome the JOIN/subsearch limit and maybe gain some efficiencies I tried using eventstats instead.  The resultant query is below, and runs for over an hour with questionable results (never really finishes.  Pretty sure it is not giving me the same output as the JOIN.  What am I doing wrong?

New query

 

 

(index=syn_sec_qualys sourcetype="qualys:hostDetection" QID=* SEVERITY IN(3 4 5) LAST_SCAN_DATETIME=* earliest=-15m@m latest=now) OR (index=syn_sec_qualys sourcetype="qualys:knowledgebase" QID=* earliest=1 latest=now)
| eventstats values(_time) AS _time, values(DNS) AS DNS values(TRACKING_METHOD) AS TRACKING_METHOD values(NETBIOS) AS NETBIOS values(OS) AS OS, values(TAGS) AS TAGS, values(TITLE) AS TITLE, values(TYPE) AS TYPE, values(SEVERITY) AS SEVERITY, values(LAST_SCAN_DATETIME) AS LAST_SCAN_DATETIME, values(LAST_FOUND_DATETIME) AS LAST_FOUND_DATETIME, values(LAST_FIXED_DATETIME) AS LAST_FIXED_DATETIME values(PUBLISHED_DATETIME) AS PUBLISHED_DATETIME values(THREAT_INTEL_VALUES) AS THREAT_INTEL_VALUES, values(THREAT_INTEL_IDS) AS THREAT_INTEL_IDS values(CVSS_V3_BASE) AS CVSS_V3_BASE, values(VENDOR_REFERENCE) AS VENDOR_REFERENCE, values(RESULTS) AS RESULTS BY IP, QID
| TABLE _time IP DNS NETBIOS TRACKING_METHOD OS TAGS QID TITLE TYPE SEVERITY STATUS LAST_SCAN_DATETIME LAST_FOUND_DATETIME LAST_FIXED_DATETIME PUBLISHED_DATETIME THREAT_INTEL_VALUES THREAT_INTEL_IDS CVSS_V3_BASE VENDOR_REFERENCE RESULTS

 

 

 

 

Labels (3)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

Try shortening the subsearch with another subsearch

index=qualys sourcetype="qualys:hostDetection" QID=* SEVERITY IN (3 4 5) LAST_SCAN_DATETIME=*  earliest=-15min latest=now()
| join type=inner QID 
  [search index=qualys sourcetype="qualys:knowledgebase" QID=* earliest=1 latest=now()
    [search index=qualys sourcetype="qualys:hostDetection" QID=* SEVERITY IN (3 4 5) LAST_SCAN_DATETIME=* earliest=-15min latest=now()
    | dedup QID
    | fields QID]
  | dedup QID
  | table QID THREAT_INTEL_VALUES CVSS_V3_BASE PUBLISHED_DATETIME 
    THREAT_INTEL_IDS VENDOR_REFERENCE TITLE] 
| table _time IP DNS NETBIOS TRACKING_METHOD OS TAGS QID TITLE TYPE SEVERITY STATUS LAST_SCAN_DATETIME LAST_FOUND_DATETIME LAST_FIXED_DATETIME PUBLISHED_DATETIME THREAT_INTEL_VALUES THREAT_INTEL_IDS CVSS_V3_BASE VENDOR_REFERENCE RESULTS

View solution in original post

tlmayes
Contributor

The help is appreciated!
The second search is necessary in that it is a 'library'.  In my original query the DEDUP was necessary to ensure only the most recent "library" event is used.  The same cannot be said for the data in the first query since I want all instances. 
Is part of the load now that the overall query is being forced to compare all records between the two queries, when in fact the comparison should only be between the [all] in the first query (generally less than 10 events), against only [newest QID] from the second query (if using dedup, 77,000 records)?   Not being able to use dedup in the suggested format means the second query is now having to compare 200,000 events.
The second search (without the 50,000) limit, is only 77,000 records.  Even with the limit as an outcome of my original JOIN, it completes in ~ 5 mins but unfortunately not accurate since it is missing ~22,000 relative records.  In every case where I use stats or eventstats it runs for over an hour.  

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Given that dedup (from the original subquery) appears to be effectively the same as latest(), and you only want to be adding the fields from the subquery with eventstats (not values from all the events), try something like this

(index=syn_sec_qualys sourcetype="qualys:hostDetection" QID=* SEVERITY IN(3 4 5) LAST_SCAN_DATETIME=* earliest=-15m@m latest=now) OR (index=syn_sec_qualys sourcetype="qualys:knowledgebase" QID=* earliest=1 latest=now)
| eventstats latest(THREAT_INTEL_VALUES) as THREAT_INTEL_VALUES latest(CVSS_V3_BASE) as CVSS_V3_BASE latest(PUBLISHED_DATETIME) as PUBLISHED_DATETIME latest(THREAT_INTEL_IDS) as THREAT_INTEL_IDS latest(VENDOR_REFERENCE) as VENDOR_REFERENCE latest(TITLE) as TITLE by QID
| where isnotnull(SEVERITY)
| TABLE _time IP DNS NETBIOS TRACKING_METHOD OS TAGS QID TITLE TYPE SEVERITY STATUS LAST_SCAN_DATETIME LAST_FOUND_DATETIME LAST_FIXED_DATETIME PUBLISHED_DATETIME THREAT_INTEL_VALUES THREAT_INTEL_IDS CVSS_V3_BASE VENDOR_REFERENCE RESULTS
0 Karma

tlmayes
Contributor

Thanks for the quick response, and apologies for my delay.  
Executed your query as written.  Unfortunately it is extremely slow compared to my original query.  Unexpectedly, using stats or eventstats seems to be very slow compared to using my original JOIN query

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try shortening the subsearch with another subsearch

index=qualys sourcetype="qualys:hostDetection" QID=* SEVERITY IN (3 4 5) LAST_SCAN_DATETIME=*  earliest=-15min latest=now()
| join type=inner QID 
  [search index=qualys sourcetype="qualys:knowledgebase" QID=* earliest=1 latest=now()
    [search index=qualys sourcetype="qualys:hostDetection" QID=* SEVERITY IN (3 4 5) LAST_SCAN_DATETIME=* earliest=-15min latest=now()
    | dedup QID
    | fields QID]
  | dedup QID
  | table QID THREAT_INTEL_VALUES CVSS_V3_BASE PUBLISHED_DATETIME 
    THREAT_INTEL_IDS VENDOR_REFERENCE TITLE] 
| table _time IP DNS NETBIOS TRACKING_METHOD OS TAGS QID TITLE TYPE SEVERITY STATUS LAST_SCAN_DATETIME LAST_FOUND_DATETIME LAST_FIXED_DATETIME PUBLISHED_DATETIME THREAT_INTEL_VALUES THREAT_INTEL_IDS CVSS_V3_BASE VENDOR_REFERENCE RESULTS

tlmayes
Contributor

Believe you nailed it!  And, never occurred to me to filter within the JOIN, before producing the final results.  Thanks for the help, and the quick responses.  

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

To be fair, your second search is effectively doing an alltime search which, not surprisingly, is taking a long time. Can you limit the time frame for this search?

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...