Solved: Use Join but also display non matching datasets

davidb89 · ‎07-11-2017

I'm currenty trying to combine data from our firewall and sysmon which is running on a testclient. I want to join the Commandline and the PID of the causing process to the firewall information. That works pretty well but I can then only see the datasets which were sucessfully joined. I want to extend the datasets that have been joined but I also want to keep the ones that couldn't be joined. This is my search:

sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block | join source_address, source_port, destinatin_port, destination_address [search sourcetype=nxlog_sysmon | rename SourceIp AS source_address, SourcePort AS source_port, DestinationPort AS destination_port, DestinationIp AS destination_address] | table source_address, destination_address, destination_port, Image

How can I extend the datasets that match the join but stell display all the unmatched ones?

somesoni2 · ‎07-11-2017

With join (which is very expensive BTW), you get option to do left join (keep everything from LHS of join) and can be done like this (modified the join subsearch to include aggregation command)

sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block | join type=left source_address, source_port, destinatin_port, destination_address [search sourcetype=nxlog_sysmon  | stats count by SourceIp SourcePort DestinationPort DestinationoIp | rename SourceIp AS source_address, SourcePort AS source_port, DestinationPort AS destination_port, DestinationIp AS destination_address] | table source_address, destination_address, destination_port, Image

But there may be a better alternative using stats. See if this give your results that you need (guessing it would work as you just doesn't seem to bother about time)

( sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block) OR (sourcetype=nxlog_sysmon)
| eval source_address=coalesce(source_address,SourceIp) | eval source_port=coalesce(source_port,SourcePort)
| eval destination_address=coalesce(destination_address,DestinationIp) | eval destination_port=coalesce(destination_port,DestinationPort)
| stats values(Image) as Image by source_address, destination_address, destination_port

View solution in original post

somesoni2 · ‎07-11-2017

With join (which is very expensive BTW), you get option to do left join (keep everything from LHS of join) and can be done like this (modified the join subsearch to include aggregation command)

sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block | join type=left source_address, source_port, destinatin_port, destination_address [search sourcetype=nxlog_sysmon  | stats count by SourceIp SourcePort DestinationPort DestinationoIp | rename SourceIp AS source_address, SourcePort AS source_port, DestinationPort AS destination_port, DestinationIp AS destination_address] | table source_address, destination_address, destination_port, Image

But there may be a better alternative using stats. See if this give your results that you need (guessing it would work as you just doesn't seem to bother about time)

( sourcetype="pfsensefirewall" source_address=10.0.2.10 firewall_action=block) OR (sourcetype=nxlog_sysmon)
| eval source_address=coalesce(source_address,SourceIp) | eval source_port=coalesce(source_port,SourcePort)
| eval destination_address=coalesce(destination_address,DestinationIp) | eval destination_port=coalesce(destination_port,DestinationPort)
| stats values(Image) as Image by source_address, destination_address, destination_port

davidb89 · ‎07-11-2017

Thanks for your answer! It does work well and as you guessed correctly time is not a factor in this case. But out of curiosity: Is there any way to get the same result in a more efficient way?

davidb89 · ‎07-12-2017

The more effective solution comes with some serious problems. We don't currently have the exact same basis of data in both sourcetypes. When I'm using the stats search the datasets are merged eventhough there often aren't two matchind entries. I only want them to show up if there are two matching entries. I looked at all the documentation I could find about the coalesce method but it does not seem to be possible, is it?

MuS · ‎07-11-2017

the stats search created by @somesoni2 is your efficient way to do it 😉

cheers, MuS

Use Join but also display non matching datasets

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Transform your security operations with Splunk Enterprise Security

Splunk Admins and App Developers | Earn a $35 gift card!