Splunk Search

Join only returning values from one side of the search!

thisissplunk
Builder

I need to join data from two (or more, ultimately) different sourcetypes based on the shared "host" field. Just a subsearch will not achieve what I want here due to the way I want to stats the data out. This join almost works:

index=customer_1 sourcetype=http 
 [search index=customer_1  sourcetype=dns | dedup host | fields host]
| join host [search index=customer_1 sourcetype=dns] 
| stats values(dns_answer) values(http_uri) values(http_response) values(sourcetype) dc(sourcetype) values(_time) count by http_domain

It displays the expected data, almost! However, values(sourcetype) is only returning "dns" as a value! Join is not pushing the sourcetype value of "http" into the events, even though all of the other http fields are being joined correctly and displaying in the stats output. I imagine any other fields that share a name would only get the right side's value as well but I haven't tested. What gives?

Side note - I use a subsearch initially on the left side of the join because it would be too much data to feed into the right side without it. I'm not entirely sure what I'm doing is best practice but the way joins work are actually backwards in terms of the way I want to use them here, efficiently speaking (I wish I could flip the left and right queries but that doesn't work).

Anyways, anyone know what is going on here with the missing values from the left side?

Tags (2)
0 Karma

sundareshr
Legend

Have you tried it without the subsearch and join, like this

index=customer_1 sourcetype=http OR sourcetype=dns
 | stats dc(host) as hosts values(dns_answer) values(http_uri) values(http_response) values(sourcetype) dc(sourcetype) values(_time) count by http_domain | where hosts=1

thisissplunk
Builder

So this would work under normal circumstances. However, I need to group the events together if and only if a certain kind of event exists in the first place. If that events exists, it I take out the "host" with a subsearch and then search for that in other places. To confuse the point further,, I'm not really using "host" in my real query. The value could be a host, ip, etc, so I cannot effectively "count by http_domain" because I'll never know what was given to me first.

These logs are coming from an alert system.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...