Solved: Unable to get expected computed result while joini...

splunkbeginner1 · ‎03-28-2024

I'm attempting to compute the total number of API calls from our backend engine. Initially, I process API identification text logs as events in the engine's index, enabling me to filter respective request IDs. Simultaneously, I process the target_num count within the same index/source. By merging these two logs through a join operation, I filter out all relevant requests to compute the total API calls accurately, achieving the desired outcome.

Subsequently, I aim to enhance this by joining the filtered request IDs with another platform's index/source. Here, I intend to determine the success or failure status of each request at the platform level and then multiply it by the original value of target_num. However, upon combining these queries, I'm experiencing discrepancies in the execution results. I'm uncertain about the missing piece causing this issue.

My Final Query : <x-request-id is an existing field on platform index and there is no rex I am using>
----------------------

index=default-va6* sourcetype="myengine-stage" "API call is True for MyEngine"
| rex field=_raw "request_id=(?<reqID>.+?) - "
| dedup reqID
| join reqID [
      search index=default-va6* sourcetype="myengine-stage" "Target language count"
      | rex field=_raw "request_id=(?<reqID>.+?) - "
      | rex field=_raw "Target language count (?<num_target>\d+)"
      | dedup reqID
      | fields reqID, num_target ]
| fields reqID, num_target
| stats count("reqID") as total_calls by num_target
| eval total_api_calls = total_calls * num_target
| stats sum(total_api_calls) as Total_Requests_Received

| rename reqID AS "x-request-id"
| join "x-request-id" [
       search index=platform-va6 sourcetype="platform-ue*" "Marked request as"
       | eval num_succeed = if(like(message, "Marked request as succeed%"), 1, 0)
       | eval num_failed = if(like(message, "Marked request as failed%"), 1, 0)
       | fields num_succeed, num_failed ]
| fields num_succeed, num_failed
| stats sum(num_succeed) as num_succeed, sum(num_failed) as num_failed
| eval total_succeed_calls = num_succeed * num_target, total_failed_calls = num_failed * num_target

ITWhisperer · ‎03-30-2024

Thanks. It is always best to give an accurate representation of your data as it saves everyone's time.

Try like this:

| makeresults
| eval _raw="2024-03-29 12:25:15,276 _engine - INFO - process - request_id=testabc-012 - user-id=test01 Target language count 1
2024-03-29 12:25:15,276 _engine - INFO - process - request_id=testabc-123 - user-id=test01 Target language count 1
29/03/2024 17:55:14.991	_engine - INFO - process - request_id=testabc-123 - user-id=test01 API call is True for MyEngine
29/03/2024 17:55:14.991	_engine - INFO - process - request_id=testabc-234 - user-id=test01 API call is True for MyEngine
{\"timestamp\":\"2024-03-30T11:28:58.438Z\",\"logger_name\":\"MessageHandler\",\"thread_name\":\"threadPoolTaskExecutor-22\",\"level\":\"INFO\",\"serviceArchPath\":\"worker\",\"process\":\"NA\",\"message\":\"Marked request as succeed. {\\\"status\\\":\\\"PREDICT_SUCCESS\\\",\\\"message\\\":null}, critical Path: {\\\"requestStartTime\\\":1711797346337,\\\"operationsInCriticalPath\\\":{\\\"PREDICT:Feature:01\\\":{\\\"queue_overhead_ms\\\":59,\\\"sdk_overhead_ms\\\":25,\\\"process_time_ms\\\":432265,\\\"total_time_ms\\\":432349},\\\"PREDICT:Feature:02\\\":{\\\"queue_overhead_ms\\\":68,\\\"sdk_overhead_ms\\\":17,\\\"process_time_ms\\\":358611,\\\"total_time_ms\\\":358697},\\\"PLAT_CORE:Orchestrator\\\":{\\\"queue_overhead_ms\\\":142,\\\"process_time_ms\\\":158,\\\"total_time_ms\\\":300,\\\"hbase_overhead_in_ms\\\":136},\\\"PLAT_CORE:inference-core\\\":{\\\"total_time_ms\\\":8,\\\"hbase_overhead_in_ms\\\":5},\\\"PREDICT:Feature:03\\\":{\\\"queue_overhead_ms\\\":78,\\\"sdk_overhead_ms\\\":5,\\\"process_time_ms\\\":663,\\\"total_time_ms\\\":747}},\\\"currentOperationTimingInfo\\\":{},\\\"total_time_ms\\\":792101}\",\"x-request-id\":\"testabc-123\",\"x-service-id\":\"testID\",\"x-api-key\":\"test-client\",\"x-client-id\":\"test-client\",\"invocation_id\":\"test1\",\"x-user-id\":\"test@abc.com\",\"x-access-protected-e\":\"true\",\"trace_id\":\"testabc-0000\",\"trace_flags\":\"00\",\"span_id\":\"b1\"}"
| multikv noheader=t
| table _raw
| rex "Marked request as (?<finalStatus>\w+).+\"x-request-id\":\"(?<reqID>[^\"]+)\""
| rex field=_raw "request_id=(?<reqID>.+?) - .+(Target language count|API call is True for MyEngine)"
| rex field=_raw "Target language count (?<num_target>\d+)"
| rex field=_raw "API call is (?<callTrue>True) for MyEngine"
| stats first(num_target) as num_target first(callTrue) as callTrue first(finalStatus) as finalStatus by reqID
| where callTrue=="True" AND isnotnull(num_target)

View solution in original post

splunkbeginner1 · ‎03-30-2024

@ITWhisperer
Thanks a lot!!

ITWhisperer · ‎03-28-2024

Ideally, you should avoid join if possible.

It looks like the first part of your search could be replaced by this

index=default-va6* sourcetype="myengine-stage" "API call is True for MyEngine" OR "Target language count"
| rex field=_raw "request_id=(?<reqID>.+?) - "
| rex field=_raw "Target language count (?<num_target>\d+)"
| stats first(num_target) as num_target by reqID

For the second join, x-request-id is not returned by the subsearch so the join will fail anyway.

Perhaps there is another way to approach this, but for that we would need some (anonymised) sample events from your data sources, and perhaps a non-SPL definition of what it is you are trying to achieve.

splunkbeginner1 · ‎03-29-2024

Thanks for help!!!

The initial query cannot be replaced with "share query" because the backend engine processes "Logtext - 'target language count'" with a numeric count value for each request/process. Only a subset of these requests triggers an API call, marked as true. Consequently, the engine proceeds to process subsequent logtext events for those requests where the API call value is true. Therefore, my objective is to capture all such requests with a true API call and obtain the corresponding target language count values. This allows me to multiply the language count by the total count of processed requests. Your suggested query retrieves all events across all processed requests where the language count was processed.
Additionally, there's another index, the "platform index," to which the engine processes events with the final request status marked as either "succeed" or "failed." In line with my previous objective, I aim to extract the final status from this index for requests where the API call was true. Consequently, I am attempting to integrate the initial query with a new query for the platform index.

Below are sample events for reference from both of indexes when running the search query:

My Purpose as per Initial shared query : I need to identify request IDs with "API Call" as true, get the count of "Target_Lang" for these IDs, calculate total API calls by multiplying num_lang count with request ID count, and further fetch final status from platform index for filtered request IDs, and determine the count of failed/successful API calls based on status.

Sub_Query_1:
index=default-va6 sourcetype="myengine-stage" ("API call is True for MyEngine" OR "Target language count") "testabc-123"

Event-1:
2024-03-29 12:25:15,276 
_engine - INFO - process - request_id=testabc-123 - user-id=test01 Target language count 1

Event-2:
29/03/2024 17:55:14.991	
_engine - INFO - process - request_id=testabc-123 - user-id=test01 API call is True for MyEngine

Sub_Query_2: 
index=platform-va6 sourcetype="platform-ue*"  "testabc-123" "Marked request as"


29/03/2024 18:01:20.556	
message: Marked request as succeed. {"status":"PREDICT_SUCCESS"}
x-request-id: testabc-123

ITWhisperer · ‎03-29-2024

Try something like this

(index=default-va6* sourcetype="myengine-stage" "API call is True for MyEngine" OR "Target language count") OR (index=platform-va6 sourcetype="platform-ue*"  "testabc-123" "Marked request as")
| rex field=_raw "Marked request as (?<finalStatus>\w+).+ x-request-id: (?<reqID>.+)"
| rex field=_raw "request_id=(?<reqID>.+?) - .+(Target language count|API call is True for MyEngine)"
| rex field=_raw "Target language count (?<num_target>\d+)"
| rex field=_raw "API call is (?<callTrue>True) for MyEngine"
| stats first(num_target) as num_target first(callTrue) as callTrue first(finalStatus) as finalStatus by reqID
| where callTrue=="True" AND isnotnull(num_target)

splunkbeginner1 · ‎03-29-2024

Thanks! @ITWhisperer
This is really helpful.

The only problem is - The shared query I tried and it is not able to fetch the final status as succeed or failed.
As per sample event , platform index has message field which is having this text as marked request as succeed or marked request as failed.

attaching snap for reference.

ITWhisperer · ‎03-30-2024

Here is a runanywhere example, using some mock-up data similar to what you posted. It show the search working. The issue may be that my dummy data is not quite representative of your data. Please examine it to see if any corrections can be made to the dummy data, e.g. I made the events single line whereas as your example seemed to show them as multi-line but I wasn't sure if that was important.

| makeresults
| eval _raw="2024-03-29 12:25:15,276 _engine - INFO - process - request_id=testabc-012 - user-id=test01 Target language count 1
2024-03-29 12:25:15,276 _engine - INFO - process - request_id=testabc-123 - user-id=test01 Target language count 1
29/03/2024 17:55:14.991	_engine - INFO - process - request_id=testabc-123 - user-id=test01 API call is True for MyEngine
29/03/2024 17:55:14.991	_engine - INFO - process - request_id=testabc-234 - user-id=test01 API call is True for MyEngine
29/03/2024 18:01:20.556	message: Marked request as succeed. {\"status\":\"PREDICT_SUCCESS\"} x-request-id: testabc-123"
| multikv noheader=t
| table _raw
``` The lines above set up some dummy data ```
| rex "Marked request as (?<finalStatus>\w+).+ x-request-id: (?<reqID>.+)"
| rex field=_raw "request_id=(?<reqID>.+?) - .+(Target language count|API call is True for MyEngine)"
| rex field=_raw "Target language count (?<num_target>\d+)"
| rex field=_raw "API call is (?<callTrue>True) for MyEngine"
| stats first(num_target) as num_target first(callTrue) as callTrue first(finalStatus) as finalStatus by reqID
| where callTrue=="True" AND isnotnull(num_target)

splunkbeginner1 · ‎03-30-2024

Thanks! You are correct and on dummy data it is working perfectly fine.

Let me share the exact event structure with dummy data on platform index (platform-va6) having final status : snap attached for reference.

upon clicking on show as _raw text -

_raw text 
===========

{"timestamp":"2024-03-30T11:28:58.438Z","logger_name":"MessageHandler","thread_name":"threadPoolTaskExecutor-22","level":"INFO","serviceArchPath":"worker","process":"NA","message":"Marked request as succeed. {\"status\":\"PREDICT_SUCCESS\",\"message\":null}, critical Path: {\"requestStartTime\":1711797346337,\"operationsInCriticalPath\":{\"PREDICT:Feature:01\":{\"queue_overhead_ms\":59,\"sdk_overhead_ms\":25,\"process_time_ms\":432265,\"total_time_ms\":432349},\"PREDICT:Feature:02\":{\"queue_overhead_ms\":68,\"sdk_overhead_ms\":17,\"process_time_ms\":358611,\"total_time_ms\":358697},\"PLAT_CORE:Orchestrator\":{\"queue_overhead_ms\":142,\"process_time_ms\":158,\"total_time_ms\":300,\"hbase_overhead_in_ms\":136},\"PLAT_CORE:inference-core\":{\"total_time_ms\":8,\"hbase_overhead_in_ms\":5},\"PREDICT:Feature:03\":{\"queue_overhead_ms\":78,\"sdk_overhead_ms\":5,\"process_time_ms\":663,\"total_time_ms\":747}},\"currentOperationTimingInfo\":{},\"total_time_ms\":792101}","x-request-id":"testabc-123","x-service-id":"testID","x-api-key":"test-client","x-client-id":"test-client","invocation_id":"test1","x-user-id":"test@abc.com","x-access-protected-e":"true","trace_id":"testabc-0000","trace_flags":"00","span_id":"b1"}

ITWhisperer · ‎03-30-2024

Thanks. It is always best to give an accurate representation of your data as it saves everyone's time.

Try like this:

| makeresults
| eval _raw="2024-03-29 12:25:15,276 _engine - INFO - process - request_id=testabc-012 - user-id=test01 Target language count 1
2024-03-29 12:25:15,276 _engine - INFO - process - request_id=testabc-123 - user-id=test01 Target language count 1
29/03/2024 17:55:14.991	_engine - INFO - process - request_id=testabc-123 - user-id=test01 API call is True for MyEngine
29/03/2024 17:55:14.991	_engine - INFO - process - request_id=testabc-234 - user-id=test01 API call is True for MyEngine
{\"timestamp\":\"2024-03-30T11:28:58.438Z\",\"logger_name\":\"MessageHandler\",\"thread_name\":\"threadPoolTaskExecutor-22\",\"level\":\"INFO\",\"serviceArchPath\":\"worker\",\"process\":\"NA\",\"message\":\"Marked request as succeed. {\\\"status\\\":\\\"PREDICT_SUCCESS\\\",\\\"message\\\":null}, critical Path: {\\\"requestStartTime\\\":1711797346337,\\\"operationsInCriticalPath\\\":{\\\"PREDICT:Feature:01\\\":{\\\"queue_overhead_ms\\\":59,\\\"sdk_overhead_ms\\\":25,\\\"process_time_ms\\\":432265,\\\"total_time_ms\\\":432349},\\\"PREDICT:Feature:02\\\":{\\\"queue_overhead_ms\\\":68,\\\"sdk_overhead_ms\\\":17,\\\"process_time_ms\\\":358611,\\\"total_time_ms\\\":358697},\\\"PLAT_CORE:Orchestrator\\\":{\\\"queue_overhead_ms\\\":142,\\\"process_time_ms\\\":158,\\\"total_time_ms\\\":300,\\\"hbase_overhead_in_ms\\\":136},\\\"PLAT_CORE:inference-core\\\":{\\\"total_time_ms\\\":8,\\\"hbase_overhead_in_ms\\\":5},\\\"PREDICT:Feature:03\\\":{\\\"queue_overhead_ms\\\":78,\\\"sdk_overhead_ms\\\":5,\\\"process_time_ms\\\":663,\\\"total_time_ms\\\":747}},\\\"currentOperationTimingInfo\\\":{},\\\"total_time_ms\\\":792101}\",\"x-request-id\":\"testabc-123\",\"x-service-id\":\"testID\",\"x-api-key\":\"test-client\",\"x-client-id\":\"test-client\",\"invocation_id\":\"test1\",\"x-user-id\":\"test@abc.com\",\"x-access-protected-e\":\"true\",\"trace_id\":\"testabc-0000\",\"trace_flags\":\"00\",\"span_id\":\"b1\"}"
| multikv noheader=t
| table _raw
| rex "Marked request as (?<finalStatus>\w+).+\"x-request-id\":\"(?<reqID>[^\"]+)\""
| rex field=_raw "request_id=(?<reqID>.+?) - .+(Target language count|API call is True for MyEngine)"
| rex field=_raw "Target language count (?<num_target>\d+)"
| rex field=_raw "API call is (?<callTrue>True) for MyEngine"
| stats first(num_target) as num_target first(callTrue) as callTrue first(finalStatus) as finalStatus by reqID
| where callTrue=="True" AND isnotnull(num_target)

Unable to get expected computed result while joining second query on different index

eval

join

stats

subsearch

transaction

Introducing the Splunk Community Dashboard Challenge!

Get the T-shirt to Prove You Survived Splunk University Bootcamp

Wondering How to Build Resiliency in the Cloud?