Splunk Search

How to join two searches using one field AND a time constraint between the correlated events?

corti77
Communicator

Hi,

I am trying to get a list of workstations trying to connect to malicious DNS using PaloAlto and SYSMON logs.

From PaloAlto logs I get the list of malicious domains detected and blocked with the following query and I do a join statement looking for each malicious domain a DNS request entry in the sysmon log.

The query 

 

index="pan_logs" dns sourcetype="pan:threat" dest_zone=External dest_port=53 vendor_action=sinkhole  (action=dropped OR action=blocked) 
| dedup _time,file_name 
| table _time file_name 
| rename file_name as QueryName 
| join QueryName 
    [ search index=sysmon EventID=22 
    | eval host_querying=Computer
    | table QueryName, host_querying] 
| table _time QueryName host_querying

 

my issue comes when there are several computers accessing to the same malicious domain. The first occurrence found in the sysmon index is assigned to all the requests.

I would like to join based on the domain and a time limit between correlated events.

is it possible to do this?

Labels (1)
Tags (1)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

So, the missed requirement is that QueryName must appear in both indices.  That's easy.  Just count the indices.

(index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole )
OR (index=sysmon EventID=22)
| bin span=5m _time
| eval QueryName = coalesce(file_name, QueryName)
| stats values(Computer) as host_querying dc(index) as sources by _time QueryName
| where sources == 2

 

View solution in original post

Tags (1)

yuanliu
SplunkTrust
SplunkTrust

The command you are looking for is bin.  But for simple correlation like this, I'd also avoid using join.  Retrieve events from both sources and use stats.

(index="pan_logs" dns sourcetype="pan:threat" dest_zone=External dest_port=53 vendor_action=sinkhole  (action=dropped OR action=blocked))
OR (index=sysmon EventID=22)
| bin span=5m _time
| eval QueryName = coalesce(file_name, QueryName)
| stats values(Computer) as host_querying by _time QueryName

Hope this helps.

Tags (2)
0 Karma

corti77
Communicator

Hi @yuanliu ,

Thanks for your answer but it returns wrong results.

When I run the first part of the query independently for the last 60 minutes, I receive 13 events

index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole 

But when I execute the whole query mixing both indexes I get 1728 events

(index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole )
OR (index=sysmon EventID=22)
| bin span=5m _time
| eval QueryName = coalesce(file_name, QueryName)
| stats values(Computer) as host_querying by _time QueryName

 

I follow the approach using bin but for some reason is wrongly catching much more events.

any idea of what could be the reason?

thanks a lot.

 

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

You mentioned 13 events in pan_logs.  What is event count in sysmon?

index=sysmon EventID=22

Does this make up the difference of 1728 - 13?  If the two searches joined with OR add up to 1728, event count is correct.

The important task is correlation.  You need to illustrate your data (anonymize as needed), explain key data characteristics, illustrate the results, then explain why the results are "wrong".

0 Karma

corti77
Communicator

Hi @byuan ,

I try to explain again what I expect to obtain.

With this first query I extract the malicious DNS requests that our Palo Alto firewall has blocked.
The result for the last 60 minutes is 6 occurrences.

index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole
| table _time, file_name

_time                                    file_name

2023-02-10 12:33:56       jingermy.com

2023-02-10 12:33:56       jingermy.com

2023-02-10 12:33:00       jingermy.com

2023-02-10 12:32:55       jingermy.com

2023-02-10 12:32:55       jingermy.com

2023-02-10 12:34:01       jingermy.com

Now I want to identify exactly which workstation asked for those malicious DNS resolutions.
For that, I leverage SYSMON logs which are easier to understand than the Windows DNS logs located in the DC servers.

The SYSMON query for the last 60 minutes returns 1466 events. Those events include legit and malicious ones. 

index=sysmon EventID=22 
| table QueryName, host

Now I would like to correlate both datasets with the final goal of obtaining 6 events with the URL(file_name filed) and the requester (host)

_time                                    file_name                    host

2023-02-10 12:33:56       jingermy.com          XXXXX

2023-02-10 12:33:56       jingermy.com          XXXXX

2023-02-10 12:33:00       jingermy.com          YYYYY

2023-02-10 12:32:55       jingermy.com          YYYYY

2023-02-10 12:32:55       jingermy.com          ZZZZZ

2023-02-10 12:34:01       jingermy.com          ZZZZZ

Using your query I get more than a thousand events, containing many legit domains.

Instead, using my query I get the same number of events before and after the join, BUT the host extracted from sysmon index corresponds with the first occurrence in the index and is not related to the original _time from Palo Alto logs.

index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole
| rename file_name as QueryName
| table _time,QueryName
| join   QueryName 
    [ search index=sysmon EventID=22 
        | table QueryName, host] 
| table _time QueryName host

I hope it is clear now. many thanks once again.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

So, the missed requirement is that QueryName must appear in both indices.  That's easy.  Just count the indices.

(index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole )
OR (index=sysmon EventID=22)
| bin span=5m _time
| eval QueryName = coalesce(file_name, QueryName)
| stats values(Computer) as host_querying dc(index) as sources by _time QueryName
| where sources == 2

 

Tags (1)
Get Updates on the Splunk Community!

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Get More Out of Your Security Practice With a SIEM

Get More Out of Your Security Practice With a SIEMWednesday, July 31, 2024  |  11AM PT / 2PM ETREGISTER ...