Hi Everyone,
I have one requirement
I have created one alert like below:
index=abc ns IN ("blazepsfpublish", "blazegateway", "blazegateway-c2","blazepsfsubscribememsql","blazepsfsubscribememsql-c2","sidh-bulk-processor","sidh-datagraph3","sidh-datagraph3-c2","sidh-noss") "NullPointerException" | rex "message=(?<ExceptionMessage>[^\n]+)"|dedup ExceptionMessage,ns|eval _time = strftime(_time,"%Y-%m-%d %H:%M:%S.%3N")|table app_name, ExceptionMessage ,_time, environment, pod_name,ns|rename app_name as APP_NAME, _time as Time, environment as Environment, pod_name as Pod_Name
The issue I am facing is there are some messages that are similar like below:
2021-03-17T10:39:32.268286963Z app_name=publishpushapi environment=e1 ns=blazepsfpublish pod_container=publishpushapi pod_name=publishpushapi-deployment-66-gz8dm stream=stdout message=java.lang.NullPointerException: null
2021-03-17T10:39:16.982803933Z app_name=publishpushapi environment=e1 ns=blazepsfpublish pod_container=publishpushapi pod_name=publishpushapi-deployment-66-gz8dm stream=stdout message=java.lang.NullPointerException: null
I have already used dedup .
But I want that count should come proper like if similar messages are 7 then the message display will be 1 and count will be 7.
with stats count I am getting only 1 count.
Can someone guide me on this
At what point in your query are you doing the stats command?
I have used like this:
index=abc ns IN ("blazepsfpublish", "blazegateway", "blazegateway-c2","blazepsfsubscribememsql","blazepsfsubscribememsql-c2","sidh-bulk-processor","sidh-datagraph3","sidh-datagraph3-c2","sidh-noss") "NullPointerException" | rex "message=(?<ExceptionMessage>[^\n]+)"|dedup ExceptionMessage,ns|eval _time = strftime(_time,"%Y-%m-%d %H:%M:%S.%3N")|stats count by app_name, ExceptionMessage ,_time, environment, pod_name,ns|rename app_name as APP_NAME, _time as Time, environment as Environment, pod_name as Pod_Name
But I am getting count as 1 even if some of the messages are similar.
The dedup before the stats has removed al but the first event matching the combination of ExceptionMessage and ns, which is why your counts will always be 1.
what I should I do to get the correct count.
Try removing the dedup
I tried like this after removing dedup.
But I am getting No result
index=abc ns IN ("blazepsfpublish", "blazegateway", "blazegateway-c2","blazepsfsubscribememsql","blazepsfsubscribememsql-c2","sidh-bulk-processor","sidh-datagraph3","sidh-datagraph3-c2","sidh-noss") "NullPointerException" | rex "message=(?<ExceptionMessage>[^\n]+)"|eval _time = strftime(_time,"%Y-%m-%d %H:%M:%S.%3N")|stats count by app_name, ExceptionMessage ,_time, environment, pod_name,ns|rename app_name as APP_NAME, _time as Time, environment as Environment, pod_name as Pod_Name
Can you guide me where I am wrong
I am not sure why you would get "No result" - do you get an error? Also, what are you trying to do with the stats command? You have included _time in the grouping and with so many other fields to group on, I wouldn't be surprised if all of your counts turn out to be 1 anyway.
Probably not given that you want counts of something. The question is what are you trying to count?
I am trying to count exception messages
Given that your search already filters on NullPointerException
| stats count
will give you the count of events matching the search in your time period. Is this what you want?
yes I want the club count of the messages that are similar.
But stats count is giving one count only
Still not clear how you want to divide up the counts e.g. counts of exceptions by day? count by app_name? count by app_name and day? By including _time in your stats command, this divides the counts up by the timestamp, so unless you have lots of exceptions in the same millisecond, you are unlikely to get counts above 1