Splunk Search

Use Join but also display non matching datasets

davidb89
Engager

I'm currenty trying to combine data from our firewall and sysmon which is running on a testclient. I want to join the Commandline and the PID of the causing process to the firewall information. That works pretty well but I can then only see the datasets which were sucessfully joined. I want to extend the datasets that have been joined but I also want to keep the ones that couldn't be joined. This is my search:

sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block | join source_address, source_port, destinatin_port, destination_address [search sourcetype=nxlog_sysmon | rename SourceIp AS source_address, SourcePort AS source_port, DestinationPort AS destination_port, DestinationIp AS destination_address] | table source_address, destination_address, destination_port, Image

How can I extend the datasets that match the join but stell display all the unmatched ones?

Tags (2)
0 Karma
1 Solution

somesoni2
Revered Legend

With join (which is very expensive BTW), you get option to do left join (keep everything from LHS of join) and can be done like this (modified the join subsearch to include aggregation command)

sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block | join type=left source_address, source_port, destinatin_port, destination_address [search sourcetype=nxlog_sysmon  | stats count by SourceIp SourcePort DestinationPort DestinationoIp | rename SourceIp AS source_address, SourcePort AS source_port, DestinationPort AS destination_port, DestinationIp AS destination_address] | table source_address, destination_address, destination_port, Image

But there may be a better alternative using stats. See if this give your results that you need (guessing it would work as you just doesn't seem to bother about time)

( sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block) OR (sourcetype=nxlog_sysmon)
| eval source_address=coalesce(source_address,SourceIp) | eval source_port=coalesce(source_port,SourcePort)
| eval destination_address=coalesce(destination_address,DestinationIp) | eval destination_port=coalesce(destination_port,DestinationPort)
| stats values(Image) as Image by source_address, destination_address, destination_port

View solution in original post

somesoni2
Revered Legend

With join (which is very expensive BTW), you get option to do left join (keep everything from LHS of join) and can be done like this (modified the join subsearch to include aggregation command)

sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block | join type=left source_address, source_port, destinatin_port, destination_address [search sourcetype=nxlog_sysmon  | stats count by SourceIp SourcePort DestinationPort DestinationoIp | rename SourceIp AS source_address, SourcePort AS source_port, DestinationPort AS destination_port, DestinationIp AS destination_address] | table source_address, destination_address, destination_port, Image

But there may be a better alternative using stats. See if this give your results that you need (guessing it would work as you just doesn't seem to bother about time)

( sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block) OR (sourcetype=nxlog_sysmon)
| eval source_address=coalesce(source_address,SourceIp) | eval source_port=coalesce(source_port,SourcePort)
| eval destination_address=coalesce(destination_address,DestinationIp) | eval destination_port=coalesce(destination_port,DestinationPort)
| stats values(Image) as Image by source_address, destination_address, destination_port

davidb89
Engager

Thanks for your answer! It does work well and as you guessed correctly time is not a factor in this case. But out of curiosity: Is there any way to get the same result in a more efficient way?

0 Karma

davidb89
Engager

The more effective solution comes with some serious problems. We don't currently have the exact same basis of data in both sourcetypes. When I'm using the stats search the datasets are merged eventhough there often aren't two matchind entries. I only want them to show up if there are two matching entries. I looked at all the documentation I could find about the coalesce method but it does not seem to be possible, is it?

0 Karma

MuS
Legend

the stats search created by @somesoni2 is your efficient way to do it 😉

cheers, MuS

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...