Hi,
I am trying to get a list of workstations trying to connect to malicious DNS using PaloAlto and SYSMON logs.
From PaloAlto logs I get the list of malicious domains detected and blocked with the following query and I do a join statement looking for each malicious domain a DNS request entry in the sysmon log.
The query
index="pan_logs" dns sourcetype="pan:threat" dest_zone=External dest_port=53 vendor_action=sinkhole (action=dropped OR action=blocked)
| dedup _time,file_name
| table _time file_name
| rename file_name as QueryName
| join QueryName
[ search index=sysmon EventID=22
| eval host_querying=Computer
| table QueryName, host_querying]
| table _time QueryName host_querying
my issue comes when there are several computers accessing to the same malicious domain. The first occurrence found in the sysmon index is assigned to all the requests.
I would like to join based on the domain and a time limit between correlated events.
is it possible to do this?
So, the missed requirement is that QueryName must appear in both indices. That's easy. Just count the indices.
(index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole )
OR (index=sysmon EventID=22)
| bin span=5m _time
| eval QueryName = coalesce(file_name, QueryName)
| stats values(Computer) as host_querying dc(index) as sources by _time QueryName
| where sources == 2
The command you are looking for is bin. But for simple correlation like this, I'd also avoid using join. Retrieve events from both sources and use stats.
(index="pan_logs" dns sourcetype="pan:threat" dest_zone=External dest_port=53 vendor_action=sinkhole (action=dropped OR action=blocked))
OR (index=sysmon EventID=22)
| bin span=5m _time
| eval QueryName = coalesce(file_name, QueryName)
| stats values(Computer) as host_querying by _time QueryName
Hope this helps.
Hi @yuanliu ,
Thanks for your answer but it returns wrong results.
When I run the first part of the query independently for the last 60 minutes, I receive 13 events
index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole
But when I execute the whole query mixing both indexes I get 1728 events
(index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole )
OR (index=sysmon EventID=22)
| bin span=5m _time
| eval QueryName = coalesce(file_name, QueryName)
| stats values(Computer) as host_querying by _time QueryName
I follow the approach using bin but for some reason is wrongly catching much more events.
any idea of what could be the reason?
thanks a lot.
You mentioned 13 events in pan_logs. What is event count in sysmon?
index=sysmon EventID=22
Does this make up the difference of 1728 - 13? If the two searches joined with OR add up to 1728, event count is correct.
The important task is correlation. You need to illustrate your data (anonymize as needed), explain key data characteristics, illustrate the results, then explain why the results are "wrong".
Hi @Anonymous ,
I try to explain again what I expect to obtain.
With this first query I extract the malicious DNS requests that our Palo Alto firewall has blocked.
The result for the last 60 minutes is 6 occurrences.
index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole
| table _time, file_name
_time file_name
2023-02-10 12:33:56 jingermy.com
2023-02-10 12:33:56 jingermy.com
2023-02-10 12:33:00 jingermy.com
2023-02-10 12:32:55 jingermy.com
2023-02-10 12:32:55 jingermy.com
2023-02-10 12:34:01 jingermy.com
Now I want to identify exactly which workstation asked for those malicious DNS resolutions.
For that, I leverage SYSMON logs which are easier to understand than the Windows DNS logs located in the DC servers.
The SYSMON query for the last 60 minutes returns 1466 events. Those events include legit and malicious ones.
index=sysmon EventID=22
| table QueryName, host
Now I would like to correlate both datasets with the final goal of obtaining 6 events with the URL(file_name filed) and the requester (host)
_time file_name host
2023-02-10 12:33:56 jingermy.com XXXXX
2023-02-10 12:33:56 jingermy.com XXXXX
2023-02-10 12:33:00 jingermy.com YYYYY
2023-02-10 12:32:55 jingermy.com YYYYY
2023-02-10 12:32:55 jingermy.com ZZZZZ
2023-02-10 12:34:01 jingermy.com ZZZZZ
Using your query I get more than a thousand events, containing many legit domains.
Instead, using my query I get the same number of events before and after the join, BUT the host extracted from sysmon index corresponds with the first occurrence in the index and is not related to the original _time from Palo Alto logs.
index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole
| rename file_name as QueryName
| table _time,QueryName
| join QueryName
[ search index=sysmon EventID=22
| table QueryName, host]
| table _time QueryName host
I hope it is clear now. many thanks once again.
So, the missed requirement is that QueryName must appear in both indices. That's easy. Just count the indices.
(index="pan_logs" sourcetype="pan:threat" vendor_action=sinkhole )
OR (index=sysmon EventID=22)
| bin span=5m _time
| eval QueryName = coalesce(file_name, QueryName)
| stats values(Computer) as host_querying dc(index) as sources by _time QueryName
| where sources == 2