With the assistance of this forum, I managed to combine the events of two sourcetypes and run stats to correlate the fields on a single shared field between the two sourcetypes. The problem is, when running stats, it creates a table with mostly blank spots and only a few with all columns filled.
The search is meant to look at switch logs and pull connection data inlcuding the MAC, the IP, the switch name, and the port id. Secondly, the search pulls from a sourcetype containing all devices that have been active on the network where it pulls the hostname and mac for that device. I then use stats to match those results up on the shared mac address field—the only difference between them being the mac field from one of the sourcetypes is in lowercase vs upper. The end goal of this is to be able to have a table showing me a device's name, it's IP, it's MAC, and which switch and port it connected to.
As it is, the search does appear to work, however, because of how it's written, the resulting table is filled with blank spots from where the events from each source don't have the fields from the other source. How can I change things so it only shows rows where it has an entry for each column? Right now, based on other posts I've seen on this forum.....I'm considering whether I may be able to use eval and create fields like src_mac-{index} or something like that.....maybe with the inclusion of the coalesce command. Is this the right course of action, or is there a better way?
The only other consideration is speed.....unfortunately, there's a very good chance I may end up searching millions of events. I'm trying to find ways restrict the search, but even if I manage to, it's still going to be a lot. I'm not trying to get an instant search, but if I can get it complete in less than thirty seconds as opposed to 3+ minutes.....
Thank you
(index="routerswitch" action_type IN(Failed_Attempts, Passed_Attempts) src_mac=* SwitchName=switch1 Port_Id=GigabitEthernet1/0/21 earliest=-30d) OR (index=connections source="/var/devices.log" src_ip=172.* earliest=-30d src_mac=*)
| fields src_mac dhcp_host_name src_ip IP_Address SwitchName Port_Id
| eval src_mac=upper(src_mac)
| stats values(dhcp_host_name) as hostname values(src_ip) as IP values(IP_Address) as net_IP values(SwitchName) as switch values(Port_Id) as portID by src_mac
(index="routerswitch" action_type IN(Failed_Attempts, Passed_Attempts) src_mac=* SwitchName=switch1 Port_Id=GigabitEthernet1/0/21 earliest=-30d) OR (index=connections source="/var/devices.log" src_ip=172.* earliest=-30d src_mac=*)
| fields src_mac dhcp_host_name src_ip IP_Address SwitchName Port_Id
| eval src_mac=upper(src_mac)
| stats values(dhcp_host_name) as hostname values(src_ip) as IP values(IP_Address) as net_IP values(SwitchName) as switch values(Port_Id) as portID by src_mac
| where isnotnull(hostname) AND isnotnull(IP) AND isnotnull(net_IP) AND isnotnull(switch) AND isnotnull(portID)
The by clause will match with events with exactly the same src_mac - this includes any trailing or leading spaces, punctuation, etc. Since MAC addresses are potentially sensitive information, which you might not wish to share, are there any differences between the way the MAC addresses are stored in the different events (apart from the upper/lower case you already mentioned)?
There are absolutely no differences in the src_mac. The search *does* find the correct results where the src_mac in each sourcetype match and the full device data is shown. It's just that the stats command doesn't appear to *require* that there be a matching src_mac in each sourcetype so it can pull all the required fields from each. The end result being a table that may contain a devices src_mac and hostname....but is missing the switch port and name. Or the opposite where I'm missing the hostname but have the rest of the info.
If needed, I'll fabricate some results.
(index="routerswitch" action_type IN(Failed_Attempts, Passed_Attempts) src_mac=* SwitchName=switch1 Port_Id=GigabitEthernet1/0/21 earliest=-30d) OR (index=connections source="/var/devices.log" src_ip=172.* earliest=-30d src_mac=*)
| fields src_mac dhcp_host_name src_ip IP_Address SwitchName Port_Id
| eval src_mac=upper(src_mac)
| stats values(dhcp_host_name) as hostname values(src_ip) as IP values(IP_Address) as net_IP values(SwitchName) as switch values(Port_Id) as portID by src_mac
| where isnotnull(hostname) AND isnotnull(IP) AND isnotnull(net_IP) AND isnotnull(switch) AND isnotnull(portID)
It works! Thank you very much!