Why the error capturing using regex?

kumar497 · ‎02-27-2023

Hi All
i have been trying to capture the error split up and ratio from the following sample log event which probably needs a complex regex

{ [-]
   cluster_id: us-prod-az-200
   kubernetes: { [+]
   }
   log: { [-]
     appVersion: 0.1.326
     envType: prod
     environment: prod-txn
     log: Request and Response, consumerId=xxxxxx-xxxx-xxxx, duration=144, correlationId=0-0-0, requestType=ItemDetails, requestIds=43947812:212001513:217953998:55079684:748708658:42068997:16875745:392480759:138021380:49984819:3933145:54016598:500257082:702903612:50179695:54056450, reqOfferIds=,requestPrimaryMap=, storeIds=0000, status=PARTIAL, responseSize=16, isCustomerAddressPresent=true, extPostalCode=null, fulfillmentIntent=, error=138021380=404.IMS.STORE100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100:3933145=500.IMS.OFFER.100;404.IMS.PRICE.103:212001513=404.IMS.STORE.100:217953998=404.IMS.STORE.100;400.IMS.100:500257082=404.IMS.STORE.100, missingBadgeItems=138021380:702903612:55079684:49984819:54056450:3933145:217953998:392480759,  pickupStoreIds= 
     logLine: 93
     methodName: Utils
     serverName: 11.16.251.37
     time: 2023-02-27 14:43:33.999
     timeStamp: 1677509013999
     type: INFO
   }
   time: 2023-02-27T14:43:33.999844088Z

each event is unique with error attribute is multivalued field with delimiters for each id(only incase of error) or null as shown below,
ex: error=138021380=404.IMS.STORE100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100:3933145=500.IMS.OFFER.100;404.IMS.PRICE.103:212001513=404.IMS.STORE.100:217953998=404.IMS.STORE.100;400.IMS.100:500257082=404.IMS.STORE.100,

OR

error=,

my requirement is to compute each error code splitup and error ratio in a tabular fashion

ratio=each error code count/total responseSize

here responseSize is the number of ids passed in each request per event

error	count	responseSize	ratio
404.IMS.STORE100	aggregation of the error	aggregate of responseSize	round((count/responseSize)*100,2)
500.IMS.PRICE.103	aggregation of the error	aggregate of responseSize

can someone please help to find a better way to have the error breakdown with ratio as per the above requirement

i was trying to segregate the error split up and aggregating the responseSize but the search is not giving expected results while tabulating,

index=<index name> "log.envType"=prod "log.methodName”=“Utils”   
| rex field=_raw "responseSize=*(?<responseSize>.+?)," 
| rex field=_raw ", error=*(?<errorMap>.+), missingBadgeItems"
| eval errors0=replace(errorMap, "=", ";")
| eval errors1=split(errors0,":")
| rex field=errors1 "(?<errorCodes>.*)"
| mvexpand errorCodes
| eval code=split(errorCodes, ";")
| mvexpand code
| table code,responseSize

can someone please help..Thanks

bowesmana · ‎02-27-2023

You can try this

| rex "error=(?<error>[^,]*)"
| eval errors=split(error, ":")
| rex "responseSize=(?<responseSize>\d+)"
| table error errors responseSize
| rex max_match=0 field=errors "^(?<requestId>\d+)=(?<errorCodes>.*)"
| fields - error errors
| eval errorCodes=mvmap(errorCodes, split(errorCodes, ";"))
| stats count avg(responseSize) by errorCodes

although that will only get you part of the way, as I'm not clear what your response size needs to be. In your example, there are 3 instances of 404.IMS.STORE.100 and if you have another event with 2 instances, where the responseSize is 10, what would you want to see in terms of your responseSize field and ratios?

kumar497 · ‎02-27-2023

Thanks @bowesmana
responseSize attribute is the num of items passed in each request , im considering this field to compute the errorcode % across the overall items passed for that duration

exampe if an event have 404.IMS.STORE.100 error thrice(three items) out of 10 items , i would like aggregate each such instance across the aggregation of total items for the time duration , this should include the events with responseSize that has no errors so that overall items count are covered while ratio

1st event with 3 error instances 404.IMS.STORE.100 with responseSize=10

2nd event with 5 error instances 404.IMS.STORE.100 with responseSize=25

expected ratio per error (3+5)/(10+15)

Im stuck while mapping the error instances count and the total responseSize count while computing the ratio in a streaming fashion as it works individually while doing stats

Thanks in advance!!

bowesmana · ‎02-27-2023

Here is a runnable example using a sample of the data you gave.

See if this is doing the right thing for you -

| makeresults
| eval x=split("
     log: Request and Response, consumerId=xxxxxx-xxxx-xxxx, duration=144, correlationId=0-0-0, requestType=ItemDetails, requestIds=43947812:212001513:217953998:55079684:748708658:42068997:16875745:392480759:138021380:49984819:3933145:54016598:500257082:702903612:50179695:54056450, reqOfferIds=,requestPrimaryMap=, storeIds=0000, status=PARTIAL, responseSize=16, isCustomerAddressPresent=true, extPostalCode=null, fulfillmentIntent=, error=138021380=404.IMS.STORE.100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100:3933145=500.IMS.OFFER.100;404.IMS.PRICE.103:212001513=404.IMS.STORE.100:217953998=404.IMS.STORE.100;400.IMS.100:500257082=404.IMS.STORE.100, missingBadgeItems=138021380:702903612:55079684:49984819:54056450:3933145:217953998:392480759,  pickupStoreIds= 
###
     log: Request and Response, consumerId=xxxxxx-xxxx-xxxx, duration=144, correlationId=0-0-0, requestType=ItemDetails, requestIds=43947812:212001513:217953998:55079684:748708658:42068997:16875745:392480759:138021380:49984819, reqOfferIds=,requestPrimaryMap=, storeIds=0000, status=PARTIAL, responseSize=10, isCustomerAddressPresent=true, extPostalCode=null, fulfillmentIntent=, error=138021380=404.IMS.STORE.100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100:3933145=500.IMS.OFFER.100;404.IMS.PRICE.103:212001513=404.IMS.STORE.100:217953998=404.IMS.STORE.100;400.IMS.100, missingBadgeItems=138021380:702903612:55079684:49984819:54056450:3933145:217953998:392480759,  pickupStoreIds= 
###
     log: Request and Response, consumerId=xxxxxx-xxxx-xxxx, duration=144, correlationId=0-0-0, requestType=ItemDetails, requestIds=42068997:138021380, reqOfferIds=,requestPrimaryMap=, storeIds=0000, status=PARTIAL, responseSize=3, isCustomerAddressPresent=true, extPostalCode=null, fulfillmentIntent=, error=138021380=404.IMS.STORE.100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100, missingBadgeItems=138021380:702903612:55079684:49984819:54056450:3933145:217953998:392480759,  pickupStoreIds= 
", "##")
| mvexpand x 
| rename x as _raw
``` THIS IS THE LOGIC FROM HERE DOWN ```
| rex "error=(?<error>[^,]*)"
| eval errors=split(error, ":")
| rex "responseSize=(?<responseSize>\d+)"
| table error errors responseSize
| rex max_match=0 field=errors "^(?<requestId>\d+)=(?<errorCodes>.*)"
| fields - error errors
| eval errorCodes=mvmap(errorCodes, split(errorCodes, ";"))
``` Create a temporary event 'id' ```
| streamstats c as e
``` Count the error codes per event ```
| stats count by errorCodes responseSize e
``` Now get total error code count and total response size for the error codes
| stats sum(count) as error_count sum(responseSize) as responseSize by errorCodes
``` Calculate ratio ```
| eval ratio = round(error_count / responseSize * 100, 2)

kumar497 · ‎02-27-2023

Thanks @bowesmana tried the above approach but in certain error cases the ratio showing 100%

ideally aggregation of responseSize per event be a single unique value isnt it for a time window,
Is it possible to multiple (1/aggregatedvalue of all items size) * (error_count per errorcode) in this usecase
Also streamstats can be used after error splitting? because responseSize count for non error events has to be also included to compute overall items count, please correct me if im wrong

Thanks

bowesmana · ‎02-27-2023

I don't understand what you are trying to achieve.

If you can give an example with your data of numbers you would expect to see under certain conditions, but I don't know your data well enough to know what your desired outcome is.

kumar497 · ‎02-28-2023

log event is as shown in the above thread

in my log event the error field is logged with multiple error codes for different item ids or no errors yet all in each event as shown below and requirement is to get each error code split up with percentages

error=138021380=404.IMS.STORE.100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100

number of itemids passed in each request is logged under responseSize field which is been extracted

responseSize=3

So each event has different instances of errors and responseSize for example in a event there is 3 items passed but two items has 3 different error codes as above similarly another event has different instances of errors or no errors with different item size ,so i would like to compute error ratio like

ratio = (each type of error code count)/(total num of items in all events)

each type error code count = (event1 no of times (404.IMS.STORE.100) +event2 no of times(404.IMS.STORE.100) +...+eventN no of times (404.IMS.STORE.100))

second error code count = (event1 no of times (500.IMS.PRICE.103) +event2 no of times (500.IMS.PRICE.103) +...+eventN no of times(500.IMS.PRICE.103))

total no of items = (event1responseSize1+event2responseSize2+.....+eventNresponseSizeN)

Note: responseSize has to be considered for all events not only error related as errorcode % is determined on all the item size from all events

expected output

error	errorcount	total_items	errorratio
404.IMS.STORE.100	example 62 times	example 14577(total items count)	62/14577
400.IMS.OFFER.103	example 54 times	example 14577(total items count)	54/14577
500.IMS.PRICE.103	example 77 times	example 14577(total items count)	77/14577

so basically all different error code split up with ratio of those error percentages is the expected outcome, hope i am able to present clearly

bowesmana · ‎02-27-2023

If there is a unique ID you can use instead of streamstats c as e, then use that. e.g. you have a correlation id in the body - is that unique - if so, extract it and replace the

| streamstats c as e
| stats count by errorCodes responseSize e

with just

| stats count by errorCodes responseSize YOUR_ID

Why the error capturing using regex?

field extraction

rex

table

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I

Are you a member of the Splunk Community?

Why the error capturing using regex?

field extraction

rex

table

Prove Your Splunk Prowess at .conf25—No Prereqs Required!

Splunk Observability Cloud's AI Assistant in Action Series: Observability as Code

Splunk Answers Content Calendar, July Edition I