Splunk Search

Why am I unable to match two fields from two sourcetypes using mode=sed with rex?

jbarto
New Member

I have two sourcetypes that have URL fields. I am attempting to remove the . so that both fields are just letters and numbers. When I run the searches separately, they work correctly. When I combine with a join, it returns no results. There is matching data in the sources.

sourcetype=aws:s3 | where RESULT=="Blocked" | fields * | rex field=AWSURL mode=sed "s/[^a-zA-Z0-9]//g" | join WINURL [search sourcetype=win_dns | fields * |rex field=WINURL mode=sed "s/[^a-zA-Z0-9]//g" ] | where WINURL==AWSURL | table WINIP, WINURL, AWSURL, URL, SITE, Block_Category

this is supposed to match AWSURL to WINURL from the sed strip so www[dot]google[dot]com is wwwgooglecom in both. This does work when not joining and comparing. The logic is if a URL from aws:S3 is blocked, match the URL request from win_dns and grab the WINIP from win_dns and combine with the aws:S3 blocked data. Thanks for the help

0 Karma

yannK
Splunk Employee
Splunk Employee

My 2 cents to avoid the JOIN by using a single search, make it faster by removing the intermediary WHERE.
then you can verify the results and add the where condition at the very end.

 (sourcetype=aws:s3  RESULT="Blocked" ) OR  sourcetype=win_dns 
 | rex field=AWSURL mode=sed "s/[^a-zA-Z0-9]//g" 
 | rex field=WINURL mode=sed "s/[^a-zA-Z0-9]//g" 
 |  stats values(WINIP) values(URL) values(SITE)  values(Block_Category) by WINURL AWSURL
| where WINURL=AWSURL

Richfez
SplunkTrust
SplunkTrust

You are joining on WINURL, but only have WINURL on the joined search, not on the base search (where AWSURL is the field involved).

To get around this, rename the WINURL on the join side so you have the field AWSURL available on both sides to join on. If you need AWSURL or WINURL for other reasons, you can use something like | eval WINURL=AWSURL too. The rest may work (and this means you won't need the where, I don't think.

sourcetype=aws:s3 | where RESULT=="Blocked" | fields * | rex field=AWSURL mode=sed "s/[^a-zA-Z0-9]//g" | join AWSURL[search sourcetype=win_dns | fields * |rex field=WINURL mode=sed "s/[^a-zA-Z0-9]//g" | rename WINURL AS AWSURL ] | table WINIP, WINURL, AWSURL, URL, SITE, Block_Category

Give that a shot, let us know if it works!

If it doesn't, try running the two searches (the base search and the joined search) entirely separately and manually confirm you have fields in common they'd join on.

0 Karma

jbarto
New Member

thank you for the response

0 Karma

jbarto
New Member

The search is not matching the correct fields from the second win_dns sourcetype. I have run these directly on the two sourcetypes and see matches that are not filling in the WINIP from the second search. I changed to an outer join and the data from the win_dns source is not matching after the first match. I have found many more matches that should be found in this search as well.

example data:

DATETIME 111.111.111.111 URL1 NY
DATETIME2 NONE URL3 NY

DATETIME3 111.111.111.111 URL1 NY
DATETIME4 111.111.111.111 URL4 NY

so the IP is carried to a match that is a different URL which matches but does not have that IP. Most of the time the match is not filling in the WINIP. I have compared the strings using python regex and the match with the same substring.

0 Karma

Richfez
SplunkTrust
SplunkTrust

I'm not sure I followed your last paragraph above. In your example events, which are the "second win_dns" sourcetype ones?

Even better: Could you perhaps paste a few non-obfuscated events? Or failing that, a few carefully but only barely redacted events from each sourcetype, and then show how those events should fit together? I think we can sort you out, but I am not quite sure I see how the data fits yet.

0 Karma

ppablo
Retired

Hi @jbarto

Did @rich7177's answer solve your question? If yes, make sure to click "Accept" directly below his answer to resolve this post. If no, can you provide additional information?

0 Karma

sundareshr
Legend

Have you looked at job inspector to see if the sub-search is returning any results? There are limitations to using join.

0 Karma
Get Updates on the Splunk Community!

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...

Explore the Latest Educational Offerings from Splunk [January 2025 Updates]

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...