Splunk Search

Why the error capturing using regex?

kumar497
Path Finder

Hi All
i have been trying to capture the error split up and ratio from the following sample log event which probably needs a complex regex 

 

 

 

{ [-]
   cluster_id: us-prod-az-200
   kubernetes: { [+]
   }
   log: { [-]
     appVersion: 0.1.326
     envType: prod
     environment: prod-txn
     log: Request and Response, consumerId=xxxxxx-xxxx-xxxx, duration=144, correlationId=0-0-0, requestType=ItemDetails, requestIds=43947812:212001513:217953998:55079684:748708658:42068997:16875745:392480759:138021380:49984819:3933145:54016598:500257082:702903612:50179695:54056450, reqOfferIds=,requestPrimaryMap=, storeIds=0000, status=PARTIAL, responseSize=16, isCustomerAddressPresent=true, extPostalCode=null, fulfillmentIntent=, error=138021380=404.IMS.STORE100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100:3933145=500.IMS.OFFER.100;404.IMS.PRICE.103:212001513=404.IMS.STORE.100:217953998=404.IMS.STORE.100;400.IMS.100:500257082=404.IMS.STORE.100, missingBadgeItems=138021380:702903612:55079684:49984819:54056450:3933145:217953998:392480759,  pickupStoreIds= 
     logLine: 93
     methodName: Utils
     serverName: 11.16.251.37
     time: 2023-02-27 14:43:33.999
     timeStamp: 1677509013999
     type: INFO
   }
   time: 2023-02-27T14:43:33.999844088Z

 

 

 

each event is unique with error attribute is multivalued field with delimiters for each id(only incase of error) or null as shown below,
ex:  error=138021380=404.IMS.STORE100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100:3933145=500.IMS.OFFER.100;404.IMS.PRICE.103:212001513=404.IMS.STORE.100:217953998=404.IMS.STORE.100;400.IMS.100:500257082=404.IMS.STORE.100,

OR

error=,

my requirement is to compute each error code splitup and error ratio in a tabular fashion

ratio=each error code count/total responseSize

here responseSize is the number of ids passed in each request per event

error count responseSize ratio

404.IMS.STORE100

aggregation of the error aggregate of responseSize round((count/responseSize)*100,2)
500.IMS.PRICE.103 aggregation of the error aggregate of responseSize  

can someone please help to find a better way to have the error breakdown with ratio as per the above requirement

i was trying to segregate the error split up and aggregating the responseSize but the search is not giving expected results while tabulating,

 

 

 

index=<index name> "log.envType"=prod "log.methodName”=“Utils”  

| rex field=_raw "responseSize=*(?<responseSize>.+?),"

| rex field=_raw ", error=*(?<errorMap>.+), missingBadgeItems"
| eval errors0=replace(errorMap, "=", ";")
| eval errors1=split(errors0,":")
| rex field=errors1 "(?<errorCodes>.*)"
| mvexpand errorCodes
| eval code=split(errorCodes, ";")
| mvexpand code
| table code,responseSize 

 

 

 

can someone please help..Thanks 

 

 

 

 

 

 

Labels (3)
Tags (4)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

You can try this

| rex "error=(?<error>[^,]*)"
| eval errors=split(error, ":")
| rex "responseSize=(?<responseSize>\d+)"
| table error errors responseSize
| rex max_match=0 field=errors "^(?<requestId>\d+)=(?<errorCodes>.*)"
| fields - error errors
| eval errorCodes=mvmap(errorCodes, split(errorCodes, ";"))
| stats count avg(responseSize) by errorCodes

although that will only get you part of the way, as I'm not clear what your response size needs to be. In your example, there are 3 instances of 404.IMS.STORE.100 and if you have another event with 2 instances, where the responseSize is 10, what would you want to see in terms of your responseSize field and ratios?

0 Karma

kumar497
Path Finder

Thanks @bowesmana 
responseSize attribute is the num of items passed in each request , im considering this field to compute the errorcode % across the overall items passed for that duration

exampe if an event have 404.IMS.STORE.100 error thrice(three items) out of 10 items , i would like aggregate each such instance across the aggregation of total items for the time duration , this should include the events with responseSize that has no errors so that overall items count are covered while ratio

1st event with 3 error instances 404.IMS.STORE.100 with responseSize=10

2nd event with 5 error instances 404.IMS.STORE.100 with responseSize=25

expected ratio per error (3+5)/(10+15)

Im stuck while mapping the error instances count and the total responseSize count while computing the ratio in a streaming fashion as it works individually while doing stats 

Thanks in advance!!

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Here is a runnable example using a sample of the data you gave. 

See if this is doing the right thing for you - 

| makeresults
| eval x=split("
     log: Request and Response, consumerId=xxxxxx-xxxx-xxxx, duration=144, correlationId=0-0-0, requestType=ItemDetails, requestIds=43947812:212001513:217953998:55079684:748708658:42068997:16875745:392480759:138021380:49984819:3933145:54016598:500257082:702903612:50179695:54056450, reqOfferIds=,requestPrimaryMap=, storeIds=0000, status=PARTIAL, responseSize=16, isCustomerAddressPresent=true, extPostalCode=null, fulfillmentIntent=, error=138021380=404.IMS.STORE.100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100:3933145=500.IMS.OFFER.100;404.IMS.PRICE.103:212001513=404.IMS.STORE.100:217953998=404.IMS.STORE.100;400.IMS.100:500257082=404.IMS.STORE.100, missingBadgeItems=138021380:702903612:55079684:49984819:54056450:3933145:217953998:392480759,  pickupStoreIds= 
###
     log: Request and Response, consumerId=xxxxxx-xxxx-xxxx, duration=144, correlationId=0-0-0, requestType=ItemDetails, requestIds=43947812:212001513:217953998:55079684:748708658:42068997:16875745:392480759:138021380:49984819, reqOfferIds=,requestPrimaryMap=, storeIds=0000, status=PARTIAL, responseSize=10, isCustomerAddressPresent=true, extPostalCode=null, fulfillmentIntent=, error=138021380=404.IMS.STORE.100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100:3933145=500.IMS.OFFER.100;404.IMS.PRICE.103:212001513=404.IMS.STORE.100:217953998=404.IMS.STORE.100;400.IMS.100, missingBadgeItems=138021380:702903612:55079684:49984819:54056450:3933145:217953998:392480759,  pickupStoreIds= 
###
     log: Request and Response, consumerId=xxxxxx-xxxx-xxxx, duration=144, correlationId=0-0-0, requestType=ItemDetails, requestIds=42068997:138021380, reqOfferIds=,requestPrimaryMap=, storeIds=0000, status=PARTIAL, responseSize=3, isCustomerAddressPresent=true, extPostalCode=null, fulfillmentIntent=, error=138021380=404.IMS.STORE.100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100, missingBadgeItems=138021380:702903612:55079684:49984819:54056450:3933145:217953998:392480759,  pickupStoreIds= 
", "##")
| mvexpand x 
| rename x as _raw
``` THIS IS THE LOGIC FROM HERE DOWN ```
| rex "error=(?<error>[^,]*)"
| eval errors=split(error, ":")
| rex "responseSize=(?<responseSize>\d+)"
| table error errors responseSize
| rex max_match=0 field=errors "^(?<requestId>\d+)=(?<errorCodes>.*)"
| fields - error errors
| eval errorCodes=mvmap(errorCodes, split(errorCodes, ";"))
``` Create a temporary event 'id' ```
| streamstats c as e
``` Count the error codes per event ```
| stats count by errorCodes responseSize e
``` Now get total error code count and total response size for the error codes
| stats sum(count) as error_count sum(responseSize) as responseSize by errorCodes
``` Calculate ratio ```
| eval ratio = round(error_count / responseSize * 100, 2)

 

0 Karma

kumar497
Path Finder

Thanks @bowesmana  tried the above approach but in certain error cases the ratio showing 100% 

ideally aggregation of responseSize per event  be a single unique value isnt it for a time window,
Is it possible to multiple (1/aggregatedvalue of all items size) * (error_count per errorcode) in this usecase 
Also streamstats can be used after error splitting? because responseSize count for non error events has to be also included to compute overall items count, please correct me if im wrong

Thanks 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

I don't understand what you are trying to achieve. 

If you can give an example with your data of numbers you would expect to see under certain conditions, but I don't know your data well enough to know what your desired outcome is.

0 Karma

kumar497
Path Finder

log event is as shown in the above thread

in my log event the error field is logged with multiple error codes for different item ids or no errors yet all in each event as shown below and  requirement is to get each error code split up with percentages 

error=138021380=404.IMS.STORE.100;500.IMS.PRICE.103:42068997=400.IMS.STORE.100

number of itemids passed in each request is logged under responseSize field which is been extracted 

responseSize=3

So each event has different instances of errors and responseSize for  example in a event there is 3 items passed but two items has 3 different error codes as above similarly another event has different instances of errors or no errors with different item size ,so  i would like to compute error ratio like

ratio = (each type of error code count)/(total num of items in all events)

each type error code count = (event1 no of times (404.IMS.STORE.100)  +event2 no of times(404.IMS.STORE.100) +...+eventN no of times (404.IMS.STORE.100))

second error code count = (event1 no of times (500.IMS.PRICE.103) +event2 no of times (500.IMS.PRICE.103) +...+eventN no of times(500.IMS.PRICE.103))

total no of items = (event1responseSize1+event2responseSize2+.....+eventNresponseSizeN)

Note: responseSize has to be considered for all events not only error related as errorcode % is determined on all the item size from all events 

expected output 

errorerrorcounttotal_itemserrorratio
404.IMS.STORE.100example 62 timesexample  14577(total items count)62/14577
400.IMS.OFFER.103
example 54 timesexample  14577(total items count)54/14577
500.IMS.PRICE.103example 77 times example  14577(total items count)77/14577

 

so basically all different error code split up with ratio of those error percentages is the expected outcome, hope i am able to present clearly

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If there is a unique ID you can use instead of streamstats c as e, then use that. e.g. you have a correlation id in the body - is that unique - if so, extract it and replace the 

| streamstats c as e
| stats count by errorCodes responseSize e

with just

| stats count by errorCodes responseSize YOUR_ID
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In the last month, the Splunk Threat Research Team (STRT) has had 2 releases of new security content via the ...

Announcing the 1st Round Champion’s Tribute Winners of the Great Resilience Quest

We are happy to announce the 20 lucky questers who are selected to be the first round of Champion's Tribute ...

We’ve Got Education Validation!

Are you feeling it? All the career-boosting benefits of up-skilling with Splunk? It’s not just a feeling, it's ...