I have two sourcetypes that have URL fields. I am attempting to remove the
. so that both fields are just letters and numbers. When I run the searches separately, they work correctly. When I combine with a join, it returns no results. There is matching data in the sources.
sourcetype=aws:s3 | where RESULT=="Blocked" | fields * | rex field=AWSURL mode=sed "s/[^a-zA-Z0-9]//g" | join WINURL [search sourcetype=win_dns | fields * |rex field=WINURL mode=sed "s/[^a-zA-Z0-9]//g" ] | where WINURL==AWSURL | table WINIP, WINURL, AWSURL, URL, SITE, Block_Category
this is supposed to match AWSURL to WINURL from the sed strip so www[dot]google[dot]com is wwwgooglecom in both. This does work when not joining and comparing. The logic is if a URL from aws:S3 is blocked, match the URL request from windns and grab the WINIP from windns and combine with the aws:S3 blocked data. Thanks for the help
My 2 cents to avoid the JOIN by using a single search, make it faster by removing the intermediary WHERE.
then you can verify the results and add the where condition at the very end.
(sourcetype=aws:s3 RESULT="Blocked" ) OR sourcetype=win_dns | rex field=AWSURL mode=sed "s/[^a-zA-Z0-9]//g" | rex field=WINURL mode=sed "s/[^a-zA-Z0-9]//g" | stats values(WINIP) values(URL) values(SITE) values(Block_Category) by WINURL AWSURL | where WINURL=AWSURL
You are joining on WINURL, but only have WINURL on the joined search, not on the base search (where AWSURL is the field involved).
To get around this, rename the WINURL on the join side so you have the field AWSURL available on both sides to join on. If you need AWSURL or WINURL for other reasons, you can use something like
| eval WINURL=AWSURL too. The rest may work (and this means you won't need the
where, I don't think.
sourcetype=aws:s3 | where RESULT=="Blocked" | fields * | rex field=AWSURL mode=sed "s/[^a-zA-Z0-9]//g" | join AWSURL[search sourcetype=win_dns | fields * |rex field=WINURL mode=sed "s/[^a-zA-Z0-9]//g" | rename WINURL AS AWSURL ] | table WINIP, WINURL, AWSURL, URL, SITE, Block_Category
Give that a shot, let us know if it works!
If it doesn't, try running the two searches (the base search and the joined search) entirely separately and manually confirm you have fields in common they'd join on.
The search is not matching the correct fields from the second windns sourcetype. I have run these directly on the two sourcetypes and see matches that are not filling in the WINIP from the second search. I changed to an outer join and the data from the windns source is not matching after the first match. I have found many more matches that should be found in this search as well.
DATETIME 126.96.36.199 URL1 NY
DATETIME2 NONE URL3 NY
DATETIME3 188.8.131.52 URL1 NY
DATETIME4 184.108.40.206 URL4 NY
so the IP is carried to a match that is a different URL which matches but does not have that IP. Most of the time the match is not filling in the WINIP. I have compared the strings using python regex and the match with the same substring.
I'm not sure I followed your last paragraph above. In your example events, which are the "second win_dns" sourcetype ones?
Even better: Could you perhaps paste a few non-obfuscated events? Or failing that, a few carefully but only barely redacted events from each sourcetype, and then show how those events should fit together? I think we can sort you out, but I am not quite sure I see how the data fits yet.
Did @rich7177's answer solve your question? If yes, make sure to click "Accept" directly below his answer to resolve this post. If no, can you provide additional information?