Hello,
I am a big fan of using Join for combining results of different sourcetypes and indexes (especially with a type=left parameter) but I do see alot of hate in the community towards the usage of Join considering performance and resource usage at the backend.
Can anyone provide me with a valuable resource that shows methods for different scenarios as to when and which alternative method should be used instead of the infamous Join command, and at which scenario should Join be used as a last resort.
Regards,
Hi @zekiramhi,
try something like this:
(index=paloalto src=* dest=* policyid IN (1,2,3)) OR (sourcetype=dns index=linux)
| rename Source AS src Destination AS dest
| stats values(url) AS url values(policyid) AS policyid values(action) AS action values(query) as query BY src dest
| sort limit=0 src
N.B.: beware that "query" is a special word in Splunk, I'm not sure that you can use it as a field name, probably you have to rename it.
Ciao.
Giuseppe
Hi @zekiramhi,
usually the people that loves join are people that comes from SQL, but Splunk isn't a DB, it's a search engine, so you should try to think in a different way.
I arrived as you from SQL and I did this work at the beginning of my Splunk activity: I resetted my approach to data correlation.
The reasons to avoid join are essentially two.
How to have the same result of join?
You have to use the stats command, using BY clause for the join key and then use the options of stats to have the values you need.
e.g. if you have something like this:
index=indexA
| join key [ search index=indexB ]
| table _time key fieldA fieldB
you can have the same result with
index=indexA OR index=indexB
| stats earliest(_time) AS _time values(fieldA) AS fieldA values(fieldB) AS fieldB BY key
if you can have more values for fieldA or fieldB and you want only one, you can use last or first instead values clause.
Ciao.
Giuseppe
Hello Giuseppe,
Thank you for your response,
However, my main problem with the method you described is that I actually need specific field values from Index A, for example I only want to Join the DNS queries from index=B, only to specific source and destinations based on a couple of Policy IDs from index=A
Some of those policy IDs might not even have any query logs, so I also want them to return null values from index=B,
Hopefully you catch my drift,
Best Regards,
As I said, you have to create two separate searches (as if you're preparing a join), identifying the key fields and if you have to elaborate one of them (e.g. changing name).
then you can run a search with both the searches, correlating values by the key.
If you want more help, you could share your two searches and tell me the fields to use as key and the ones to display.
Ciao.
Giuseppe
Hello,
Absolutely, the key fields to join here would be the src and dest fields, an example query would be the following:
index=paloalto src=* dest=* policyid IN (1,2,3)
| stats values(url) by policyid src dest action
| sort limit=0 src
| join type=left src dest
[ search sourcetype=dns index=linux
| stats values(query) as query by Source Destination
| rename Source as src Destination as dest]
Best Regards,
Hi @zekiramhi,
try something like this:
(index=paloalto src=* dest=* policyid IN (1,2,3)) OR (sourcetype=dns index=linux)
| rename Source AS src Destination AS dest
| stats values(url) AS url values(policyid) AS policyid values(action) AS action values(query) as query BY src dest
| sort limit=0 src
N.B.: beware that "query" is a special word in Splunk, I'm not sure that you can use it as a field name, probably you have to rename it.
Ciao.
Giuseppe
Hello Giuseppe,
Yes I am very aware of the "query" and "search" being special field names that behave differently, I now understand the technique abit more but I think I am having problems with the duplicate or key field names IE: src and dest.
src seems to be bringing me results for both indexes and working fine but dest only brings me for one of them, and that changes based on what I rename first interestingly enough, Any ideas as to what might be the cause?
On a side note, on data models and tstats we have the ability to specifiy which field from which index to pull results from (example: index1.field1 or index2.field1), is there a way to do that for normal Splunk searches?
Apologies for the amount of questions I have bothered, just curious 😁
Best Regards,
Hi @zekiramhi,
As I said it's important to normalize fields using the same name in both the searches and eventually the same case (upper or lower).
If you haven't all the fields in both the searches, use only the one surely present in both and take the others as values or first or last.
about the fields choice, you have the fields command: list after the field command all the fields you need from both the searches.
About your curiosity, is the driver of your knowledge!
Continue to try and to ask questins: me and many others will surely answer you.
Ciao and happy splunking.
Giuseppe
P.S. Karma Points are appreciated 😉