Splunk Search

Why am I receiving error trend stats with multiple error codes from each event?

kumar497
Path Finder

Hi All, 

I am trying to tabulate the error ratio based on the following scenarios from the unique log event but further using the regex to split the error code causing the total events to be filtered out causing the overall hits to be incorrect while % calculation as the initial no of unique events is not getting preserved with eventstats 

sample log event is in json format as below and multiple errorcodes in same log event which needs a error wise split

log:<<field1>>,<<field2>>,<<field3>>,error=60KOANEWLH=500.EBS.SYSTEM.100:67MPW4X79FOJ=500.IMS.SERVEROUT.100:3534U6ZIZY39=500.EBS.SERVERIN.100;404.IMS.SERVEROUT.105:3M8TEWEKVIJK=500.IVS.XXXXX.100;404.IMS.XXXX.105:2ILTH9G0UMG1=500.IMS.XXXXXXXX.100:0UAQL48U2KWF=500.EBS.XXXXXXX.100;404.IMS.XXXXXXXXX.105, missingFulfillmentItems,<<field4>>,<<field5>>,<<field6>>

i would like to get each error code % mainly (500.XX.XXXX.100 count/total hits) 

below is the splunk search filter been used but not getting totalevents, please correct me if there is anything missed,,could someone please assist with an alternate option to compute the error trend..Thanks in advance

index=<indexname> "Search String" "Type"=prod  | eventstats sum(index) as total_hits
| rex field="log.log" ", error=*(?<errorMap2>.+), missingFulfillmentItems"
| eval errors0=replace(errorMap2, "=", ";")
| eval errors1=split(errors0,":")
| rex field=errors1 "(?<errorCodes>.*)"
| mvexpand errorCodes
| eval errorCodes1=split(errorCodes, ";")
| mvexpand errorCodes1
| where like(errorCodes1,"%500.IMS.%")
| stats count by errorCodes1,total_hits

Note: each log event is unique and has multiple error codes with in the event or no error codes in the event if its success

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

Phew. That first sentence is lost on me. And lack of capitals and punctuation doesn't help. To be honest, I don't understand what you're trying to do.

And your event is definitely not in json format. Maybe that's one of the json fields, but that's not a json event.

I don't understand the "eventstats sum(index)" part. You're trying to calculate a sum of the string field. That's most probably not what you want. Maybe you simply need "eventstats count"?

BTW, you can probably match your error codes much easier. After getting your errorMap2 just use rex to find all error codes.

| rex max_match=0 field=errorMap2 "=(?<errorcode>\d{3}[A-za-z0-9.]+)"

Now you can do with those matches whatever you want.

0 Karma

kumar497
Path Finder

Thanks @PickleRick  for your inputs 

yeah total event is in json and the field im trying to extract is part of a "log" attribute in the json event 

my intention is to compute each errorcode percentage out of the total no of requests that is logged, In this usecase  error codes will be either multiple errors in a single event or no errors, so trying to extract each errorcode and also required to preserve total no of requests as "eventstats count" in the initial phase bcz further error filtering of 5xx(with like function) yields in less num of events hope i made the requirement clear

Thanks i can use the below rex for error extraction which is helpful but still i am unable to compute the error % for a larger duration as the jobs getting finalized exceeding the user quota seems im missing something,

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Apart from the obvious techniques which @ITWhisperer already showed - removing fields to reduce memory footprint, you can do something else. Move the calculations elsewhere 😉

Drop the eventstats completely. Add a fake fallback value for those events that do not have any errors at all. Then add a fake "count" which will contain value of 1/(number of errors) per event. This way you can do sum(fake_count) by error with your stats. This will give you proper overall count split among different error values. Now you can do eventstats sum on this fake sum which should give you relatively good approximation (might be a tiny bit off due to rounding errors) to the original count.

Edit: Alternatively, you could substitute eventstats with streamstats count which will "number" your events. Then if you get max(count) in your stats, you can easily choose max of those values to be the sequence  number of your last event effectively being the count of your events. And don't forget about the fake error to account for events with no error.

0 Karma

kumar497
Path Finder

Thanks @PickleRick  for the inputs 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Shouldn't

| eventstats sum(index) as total_hits

be

| eventstats count as total_hits
0 Karma

kumar497
Path Finder

Thanks @ITWhisperer  you are right 

I modified the search as below to compute the error ratio but disk usage is exceeding the configured limits getting the job autofinalized showing 0 events with increased time window but working only with very short duration, Is there a way to optimize further can someone please help 



index=<indexname> "Search String XXXXXXXXX" "Type"=prod  

| eventstats count as total_hits
| rex field="log.log" ", errorMap=*(?<errorMap2>.+),missingFulfillmentItems"
| eval errors0=replace(errorMap2, "=", ";")
| eval errors1=split(errors0,":")
| rex field=errors1 "(?<errorCodes>.*)"
| mvexpand errorCodes
| eval errorCodes1=split(errorCodes, ";")
| mvexpand errorCodes1
| where like(errorCodes1,"%500.IMS.%")
| stats count(errorCodes1) as error_count by errorCodes1,total_hits
| eval error_ratio=round((error_count/total_hits)*100,2)
| table errorCodes1,error_count,error_ratio,total_hits
| sort -error_ratio

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

By doing the eventstats count early (before the mvexpands) you are counting the events as total_hits. Your ratio is then comparing each error against the number of events, so, for example, you could have 4 events, one of which has 4 500 errors and the other three have no errors. This would give a ratio of 100%. Is this really what you are intending to do?

0 Karma

kumar497
Path Finder

yes right the event counts will not match, my intention is to compute each errorcode percentage out of the total no of requests that is logged, In this usecase  error codes will be either multiple errors in a single event or no errors, so trying to extract each errorcode and also required to preserve total no of requests as "eventstats count" in the initial phase bcz further error filtering of 5xx(with like function) yields in less num of events hope i made the requirement clear


0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Try something like this to reduce memory requirements as you go

index=<indexname> "Search String XXXXXXXXX" "Type"=prod  
``` Possibly don't need _raw as you appear to have already extracted log.log? ```
| fields - _raw 
``` Count events with streamstats (to keep the pipeline moving) ```
| streamstats count as row 
| rex field="log.log" ", errorMap=*(?<errorMap2>.+),missingFulfillmentItems"
``` Remove fields when you are finished with them ```
| fields - 'log.log'
| eval errors0=replace(errorMap2, "=", ";")
``` Remove fields when you are finished with them ```
| fields - errorMap2
| eval errors1=split(errors0,":")
``` Remove fields when you are finished with them ```
| fields - errors0
| rex field=errors1 "(?<errorCodes>.*)"
``` Remove fields when you are finished with them ```
| fields - errors1
| mvexpand errorCodes
| eval errorCodes1=split(errorCodes, ";")
``` Remove fields when you are finished with them ```
| fields - errorCodes
| mvexpand errorCodes1
``` Use max(row) as your event count ```
| stats count as error_count max(row) as total_hits by errorCodes1
``` Only keep error counts for the codes you are interested in - moved to after stats so that correct max(row) is maintained ```
| where like(errorCodes1,"%500.IMS.%")
| eval error_ratio=round((error_count/total_hits)*100,2)
| table errorCodes1,error_count,error_ratio,total_hits
| sort -error_ratio
0 Karma

kumar497
Path Finder

Thank you so much @ITWhisperer  this  is really helpful but only constraint was around the mvexpand where the memory limits are reaching without sampling, though its working if including sampling ratio 

just would like to understand max(row) is the sequence number of each event after splitting the error codes isnt it, will it not generate too many events ,sorry i might be missing something 

stats count as error_count max(row) as total_hits by errorCodes1

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Actually, it looks like there might be an extra line needed

``` Use max(row) as your event count ```
| stats count as error_count max(row) as total_hits by errorCodes1
``` Set total_hits to max before filtering out any events ```
| eventstats max(total_hits) as total_hits
``` Only keep error counts for the codes you are interested in - moved to after stats so that correct max(row) is maintained ```

 Moving the eventstats to here (after the stats by errorCodes1) means it is only processing the stats events rather than all the events.

Also, you should bear in mind the hint from @PickleRick about having a fake error code so that the last row has an error code that appears in errorCodes1 i.e. non-null

0 Karma

PickleRick
SplunkTrust
SplunkTrust

In fact, the percentages can sum to much more than 100%

Remember that if you're statsing by multivalued fields (or if you count all events early and mvexpand your fields later) the same original event can be counted multiple times.

Percentage counted this way does make sense but one has to understand what sense it is.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It's most probably because eventstats is a dataset processing command which means that it needs to get all the events from your search before it can process them. That's not good for big searches.

You could try to "cheat" by using streamstats and then chosing max value of your streamstats.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...