Splunk Search

How to replace Join to search large data sets

dtaylor
Explorer

I've been working on a search that I *finally* managed to get working that would look for events generated by a provided network switch and port name and then gives me all the devices that have connected to that specific port over a period of time. Fortunately, most of the device data is included alongside the events which contain the switch/port information.....that is....evenything except the hostname. Because of this, I've tried to use the join command to perform a second search through a second data set which contains the hostnames for all devices which have connected to the network and match those hostnames based on the shared MAC address field.

The search works, and that's great, but it can only work over a time period of about a day or so before the subsearch breaks past the 50k event limit. Is these anyway I can get rid of the join command and maybe use the stats command instead? That's what simialr posts to this one seem to suggest, but I have trouble wrapping my head around how the stats command can be used to correlate data from two different events from different data sets.....in this case the dhcp_host_name getting matched to the corresponding device in my networking logs. I'll gladly take any assistance. Thank you.



 

 

 

index="indexA" log_type IN(Failed_Attempts, Passed_Authentications) IP_Address=* SwitchID=switch01 Port_Id=GigabitEthernet1/0/13
| rex field=message_text "\((?<src_mac>[A-Fa-f0-9]{4}\.[A-Fa-f0-9]{4}\.[A-Fa-f0-9]{4})\)" 
| eval src_mac=lower(replace(src_mac, "(\w{2})(\w{2})\.(\w{2})(\w{2})\.(\w{2})(\w{2})", "\1:\2:\3:\4:\5:\6"))
| eval time=strftime(_time,"%Y-%m-%d %T") 
| join type=left left=L right=R max=0 where L.src_mac=R.src_mac L.IP_Address=R.src_ip
    [| search index="indexB" source="/var/logs/devices.log"
    | fields src_mac src_ip dhcp_host_name] 
| stats values(L.time) AS Time, count as "Count" by L.src_mac R.dhcp_host_name L.IP_Address L.SwitchID L.Port_Id 

 

Labels (2)
0 Karma
1 Solution

MuS
Legend

Hi dtaylor,

You have seen this https://community.splunk.com/t5/Splunk-Search/How-to-compare-fields-over-multiple-sourcetypes-withou... since you are already thinking of using `stats` 

 

The important thing is really to get a common field from the various data sets and use that in your stats in your case you could use the field `src_mac` as simple as 

| stats values(*) AS * by _time src_mac

after your base search should work as long as you get src_mac for all data sets.

 

Hope this helps ...

cheers, MuS

 

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

To build on what @MuS says, here's a simple example that simulates two data sets, the switch data (index A) and the devices data (index B) and the stats command shows how to "join" on the two.

So, everything up to the last two lines is just setting up dummy data sets to model your example and then the search/stats does sort of what you are looking to do - you can just copy/paste this to a search window

| makeresults count=1000
| fields - _time
| eval SwitchID=printf("Switch%02d",random() % 5)
| eval Mac=printf("00-B0-D0-63-C2-%02d", random() % 10)
| eval index="A"
| append [ 
  | makeresults count=1000
  | fields - _time
  | eval r=random() % 10
  | eval Mac=printf("00-B0-D0-63-C2-%02d", r)
  | eval dhcp_host_name=printf("Host%02d", r)
  | eval index="B", source="/var/logs/devices.log"
  | fields - r
]
| eval r=random() % 10
| sort r
| fields - r

``` Now we have a bunch of rows from index A and B```
| search (index="A" SwitchID=switch01) OR (index="B" source="/var/logs/devices.log")
| stats count values(dhcp_host_name) as dhcp_host_name values(SwitchID) as SwitchID by Mac

Hope this helps

MuS
Legend

Hi dtaylor,

You have seen this https://community.splunk.com/t5/Splunk-Search/How-to-compare-fields-over-multiple-sourcetypes-withou... since you are already thinking of using `stats` 

 

The important thing is really to get a common field from the various data sets and use that in your stats in your case you could use the field `src_mac` as simple as 

| stats values(*) AS * by _time src_mac

after your base search should work as long as you get src_mac for all data sets.

 

Hope this helps ...

cheers, MuS

 

Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...