I want to find the number of events occurring in sourcetype=B based on the distinct Device_MAC_Address searched from sourcetype=A. Note that I have the same field “Device_MAC_Address” extracted from both sourcetype events shown in the following example logs. I want to perform this search over a week period. Wondering if I can do this with sub search sourcetype=A being the inner and sourcetype=B being the outer. The current number of events in sourcetype=A returns about a million records over 60 min of search. The challenge is to pass all distinct MAC address to the outer search and my search will need to span for over a week, and I can’t seem to find a solution for this.
I will need some help from the experts here with the command I can use to get my search successfully and efficiently performed.
Sourcetype=A: In this example log the field Device_MAC_Address is AB-CD-EF-AB-CD-EF
2017-12-11T00:53:30.223Z,au:05,ASIAuth,XXX.XX.XX.XXX,abcd.efab.abcd,abcd.efab.cdef,0/0/0/201,abcdef-ab02cde03.xxxx.xx.com,AB-CD-EF-AB-CD-EF,00-11-22-33-44-55:XXXXXXX,00-11-22-33-44-55,,,XXXXXXX,2ms,0A0040020500000017F7F1EC,,,FAILURE,Unknown,MacAssociationNotFound,Search NoSuch Object: Protocol operation=SearchResultDone| Message Id=6180853| Result Code=32| Matched DN=ou=2|o=Devices| Error Message=NDS error: no such entry (-601)
Sourcetype=B: In this example log the field Device_MAC_Address is AB-CD-EF-AB-CD-EF
2017-12-11T00:46:23.749Z,lu:07,MSIAuth,xxx.xx.xx.xxx,!firstname.lastname@example.org,abcdef,0/0/0/201,chabcdef-ab02cde04.xxxx.xx.com, AB-CD-EF-AB-CD-EF,00-11-22-33-44-55:XXXXXXX,00-11-22-33-44-55,,,XXXXXXX,69ms,0A084003040000000DCC9E12,86400,,FAILURE,Unknown,UserNotFound,Search failure| found no entries
If you only need the eventcount, then this should do the trick:
index=foo sourcetype=A OR sourcetype=B Device_MAC_Address=* | eval isFromA=if(sourcetype=="A",1,0) | eval isFromB=if(sourcetype=="B",1,0) | stats sum(isFromA) as A sum(isFromB) as eventcount by Device_MAC_Address | search A>0 eventcount>0 | fields - A
You may wanna watch this presentation, it gives you a good start in questions like this: http://conf.splunk.com/sessions/2016-sessions.html#search=Let%20Stats%20Sort%20Them%20Out&
A better approach for large datasets is to actually use
stats rather than subsearch to do the join, e.g.
(sourcetype=A <more sourcetypeA predicates>) OR (sourcetype=B <more sourcetypeA predicates>) | stats dc(Device_MAC_Address) AS sourcetypes <more stats aggregates> | search sourcetypes=2 ....