Hello Friends,
I am trying to join the 2 logs with same index using trx_id(here it is called X_Correlation_ID ) but subquery is returning more than 3000K rows hence it is not working.
can someone please help me with another way to join two logs without using "join" command.
index=xyz X_App_ID=abc API_NAME=abc_123 NOT externalURL
|rename X_Correlation_ID AS ID
|table ID
|join ID [search index=xyz "xmlResponseMapping" |rename X_Correlation_ID AS ID |table accountType,accountSubType,ID]
|table ID,accountType,accountSubType
stats is always the way to join datasets together, please remove join from your toolkit, it can always be replaced with a better option and is not the Splunk way to do things. It has numerous side effects that can result in unexpected results, as you are seeing.
@kamlesh_vaghela gives you an example of how to "join" using stats, but one other observation on your example is that you are using table which is a transforming Splunk command, so you should try to use this as late as possible in your SPL as this has consequences on where the data is maniuplated.
If you are just looking to restrict the fields before an operation, use the fields command instead and note that in the stats example, you can still rename theX_Correlation_ID to ID after the stats command, which is a minor optimisation.
Log1 (dataset1)
Splunk Query :- index=xyz X_App_ID=abc API_NAME=abc_123 NOT externalURL
Output of splunk Query :-
2025-05-01 04:54:57.335 : X-Correlation-ID=1234-acbd : X-App-ID=abc : X-Client-ID=kjzoAHK7Bt2vnV5jLQIUuKQZDaXqtJJK : X-Client-Version=6.0.0.3627 : X-Workflow= : serviceType= : API_NAME=abc_123 : COMPLETE_URL=<URL> : Client_IP=<IP>: ApiName=abc_123 : StatusCode=200 : ExecutionTime=234 : Brand=abc_345 : Response={JASON response}
Log2 (dataset2)
Spluk Query :- index=xyz "xmlResponseMapping"
Output of splunk Query :-
2025-05-01 04:54:57.335 : X-Correlation-ID=1234-acbd : xmlResponseMapping : accountType=null, accountSubType=null,
Dataset1 and Dataset2 are connected using "X-Correlation-ID" only and dataset2 has more than 3000K logs for last 8 hrs,while dataset1 has 20-21K logs for last 8hrs
I want "accountType" and "accountSubType" from dataset2 for X-Correlation-ID=<alpha-numeric> where X-App-ID=abc from dataset1.
Dataset2 is having data for multiple "X-App-ID" but doesnt have field "X-App-ID" in logs.
If I try below query then it is giving me output of 3000K (all from dataset2)
index=masapi (X_App_ID=ScamShield API_NAME=COMBINED_FEATURES NOT externalURL) OR (napResponseMapping)
|stats values(accountType) as accountType values(accountSubType) as accountSubType by X_Correlation_ID
Kindly suggest the better way.
OK, got it, so you only want the events from dataset 2 where there is a matching event from dataset 1.
In that case, you can do the stats and retain the X_App_ID using stats values(X_App_ID). I assume there is only ONE X_App_ID per correlation Id.
Then you just filter out the unwanted correlation Ids from dataset 2 where there are no X_App_ID events joined.
index=masapi (X_App_ID=ScamShield API_NAME=COMBINED_FEATURES NOT externalURL) OR (napResponseMapping)
``` The values(X_App_ID) retains the app id for those events in dataset 1 - which will only include ScamShield ```
| stats values(X_App_ID) as X_App_ID values(accountType) as accountType values(accountSubType) as accountSubType by X_Correlation_ID
``` and this will remove all the correlation ids that did not have a corresponding X_App_ID event in dataset 1 ```
| where isnotnull(X_App_ID)
Hope this helps
Try something like this.
index=xyz (X_App_ID=abc API_NAME=abc_123 NOT externalURL) OR ("xmlResponseMapping")
|stats values(accountType) as accountType values(accountSubType) as accountSubType by X_Correlation_ID
KV