Getting Data In

Join Two Query With Different Raw Message But Common Thread ID

rishav-ukg
Loves-to-Learn Lots

Below are 2 queries which returns different events but have a common field thread_id which can be taken by using below rex. 
raw message logs are different for both queries.
I want events list with raw message logs from both query but only if each raw message has this common thread_id

I have tried multiple things like join, append, map and github copilot as well but not getting the desired results.
Can somebody please help on how to achieve this.
  

rex field=_raw "\*{4}(?<thread_id>\d+)\*"
index="*sample-app*" ("*504 Gateway Time-out*" AND "*Error code: 6039*")
index="*sample-app*" "*ExecuteFactoryJob: Caught soap exception*"

 

 

index="*wfd-rpt-app*" ("*504 Gateway Time-out*" AND "*Error code: 6039*")
| rex field=_raw "\*{4}(?<thread_id>\d+)\*"
| append [ search index="*wfd-rpt-app*" "*ExecuteFactoryJob: Caught soap exception*" | rex field=_raw "\*{4}(?<thread_id>\d+)\*" ]
| stats values(_raw) as raw_messages by _time, thread_id
| table _time, thread_id, raw_messages

 

 

I tried above query but it is returning some results which is correct which contains raw message from both the queries, but some results are there which contains thread id and only the 504 gateway message even though the thread_id has both type of message when I checked separately.

I'm new to splunk, any help is really appreciated.

Labels (2)
Tags (3)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

Have you tried something like this?

index="*wfd-rpt-app*" ("*504 Gateway Time-out*" AND "*Error code: 6039*")
OR "*ExecuteFactoryJob: Caught soap exception*"
| rex field=_raw "\*{4}(?<thread_id>\d+)\*"
| stats values(_raw) as raw_messages by _time, thread_id
| table _time, thread_id, raw_messages

Are you sure that you want/can use _time inside by? This means that those events must have exactly same time even into ms level or deeper level? 

If this didn't work for you then you should give some sample data which we can use to get better understanding for your case. Also giving example output from that data is valuable for us.

r. Ismo

0 Karma

rishav-ukg
Loves-to-Learn Lots

Thank you for your response.


I'll take an example to explain it better

let's say we have two events entries in splunk as below

1.
****1111222*abcabcabac*ERROR*<time>
Logging server exception...
Error code: 6039
Error description: An internal message proxy HTTP error occurred when the request was being processed Parameter: 504 Gateway Time-out

2.
****1111222*xyzxyz*0078*ERROR*<time>
ExecuteFactoryJob: Caught soap exception.
Java factory ID: 3910059732_3_0_223344
Request failed after tries = 1


So now these two are different events entries in splunk which can be fetched by query1 and query2 separately. Now first I checked for 504 Gateway timeout with error code 6039 and took the thread_id (111122 in above example) by using rex and now using this thread_id I look for 2nd event entries as shown in above example and if it is found then return 1 single event back as result containing both events in raw message.

It's like inner join.
if either event is not present then it should not be returned. I tried using join as well but it didn't work.
Tried your query it is giving result some of which contains a single event and some contains both events grouped (which is expected).
All the result events individually should contain raw_message of both examples

For above example result should be 1 single event like below:

****1111222*abcabcabac*ERROR*<time>
Logging server exception...
Error code: 6039
Error description: An internal message proxy HTTP error occurred when the request was being processed Parameter: 504 Gateway Time-out
****1111222*xyzxyz*0078*ERROR*<time>
ExecuteFactoryJob: Caught soap exception.
Java factory ID: 3910059732_3_0_223344
Request failed after tries = 1


@isoutamo 
@ITWhisperer 

Tags (3)
0 Karma

isoutamo
SplunkTrust
SplunkTrust
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Using wildcards at the beginning and end of search strings is not necessary (or advised) and if you can narrow your search of indexes, that might improve matters. As @isoutamo says, using _time in the by clause may not give you what you expect as you will get a different result event (row) for each _time, thread id combination. Also, AND is implied in searches and therefore unnecessary in this instance.

Try something like this

index="wfd-rpt-app" ("504 Gateway Time-out" "Error code: 6039")
OR "ExecuteFactoryJob: Caught soap exception"
| rex field=_raw "\*{4}(?<thread_id>\d+)\*"
| stats values(_raw) as raw_messages by thread_id
| table thread_id, raw_messages

The time of each of the events is likely to be in the _raw message, but if you want that broken out in some way, please provide some sample raw event data (anonymised appropriately) and a description / example of your expected results.

0 Karma
Get Updates on the Splunk Community!

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Unleash Unified Security and Observability with Splunk Cloud Platform

     Now Available on Microsoft AzureThursday, March 27, 2025  |  11AM PST / 2PM EST | Register NowStep boldly ...

Splunk AppDynamics with Cisco Secure Application

Web applications unfortunately present a target rich environment for security vulnerabilities and attacks. ...