Monitoring Splunk

System failures

rajatsinghbagga
Explorer

Hello Everyone,

I am trying to identify the system failure based on the below sample data :-

ABCD    AB1234  USERID SYSTEM   
ABCD    AB1234  XXXXX   
ABCD    AB1234  YYYYY   
ABCD    AB1234  ZZZZZZ  
ABCD    AB1234  FAILD   
ABCD    AB1231  USERID USER1    
ABCD    AB1231  XXXXX   
ABCD    AB1231  YYYYY   
ABCD    AB1231  ZZZZZZ  
ABCD    AB1231  FAILD   
ABEF    AB1235  USERID SYSTEM   
ABEF    AB1235  XXXXX   
ABEF    AB1235  YYYYY   
ABEF    AB1235  ZZZZZZ  
ABEF    AB1235  FAILD   
DEFG    AB1231  USERID SYSTEM   
DEFG    AB1231  XXXXX   
DEFG    AB1231  YYYYY   
DEFG    AB1231  ZZZZZZ  
DEFG    AB1231  FAILD   
DEFG    AB1231  USERID USER2    
DEFG    AB1231  XXXXX   
DEFG    AB1231  YYYYY   
DEFG    AB1231  ZZZZZZ  
DEFG    AB1231  FAILD   

First column represent JOBNAME, second JOBID and third MSGTXT

The JOBNAME and JOBID combination is unique for a process. I am trying to get the count on FAILD for only USERID SYSTEM appearing in MSGTXT field by first two chars of JOBNAME

I tried using TRANSACTION command as below but it didn't gave me expected results.

index=system_data JOBID=* JOBNAME=* 
| transaction JOBNAME JOBID keepevicted=1 startswith="*USERID SYSTEM*" endswith="*FAILD*"  
| eval JOB = substr(JOBNAME,1,2)
| stats values(eventcount) as failures by JOB
| where eventcount>0

I am expecting the output to be as

AB 2
DE 1

Please assist.

Thank you

Tags (1)
0 Karma
1 Solution

diogofgm
SplunkTrust
SplunkTrust

Use this instead of transaction:

| rex field="JOBNAME" "(?<JOB>^.{2})"
| stats values(JOB) AS JOB values(MSGTXT) AS MSGTXT by JOBNAME JOBID
| search MSGTXT = "USERID SYSTEM" MSGTXT="FAILD"
| stats count by JOB

When dealing with a lot of data stats will be more performant that transaction.
This basically extracts the 1st 2 chars in the job name and creates a field called job.
Then I group all the msgtxt by job name and job id giving me something close to what you would get using the transaction.
After, I just filter the results to get the user system and the failed message.
At to finish it I just stats it again to get the count per job.

Hope this helps.

------------
Hope I was able to help you. If so, an upvote would be appreciated.

View solution in original post

diogofgm
SplunkTrust
SplunkTrust

Use this instead of transaction:

| rex field="JOBNAME" "(?<JOB>^.{2})"
| stats values(JOB) AS JOB values(MSGTXT) AS MSGTXT by JOBNAME JOBID
| search MSGTXT = "USERID SYSTEM" MSGTXT="FAILD"
| stats count by JOB

When dealing with a lot of data stats will be more performant that transaction.
This basically extracts the 1st 2 chars in the job name and creates a field called job.
Then I group all the msgtxt by job name and job id giving me something close to what you would get using the transaction.
After, I just filter the results to get the user system and the failed message.
At to finish it I just stats it again to get the count per job.

Hope this helps.

------------
Hope I was able to help you. If so, an upvote would be appreciated.

View solution in original post

rajatsinghbagga
Explorer

Thanks @diogofgm , your SPL works like a charm, Thank you very much. But somewhere in my actual data I have a global job which is associated with every actual job and it shares the same JOBID . Because of this i am always getting this global JOB as XX with every instance of actual JOB AB or DE for this example. Since this global job shares the same JOBID as the actual jobs so I issued | dedup JOBID before the last stats command in your SPL. And i am getting the results as

AB   2
XX   2
DE   1
XX   2    

I also tried to give | where NOT JOB == "XX" but then it gives me no results.
is there any way i can suppress/hide XX for the results??

Thank you very much

0 Karma

rajatsinghbagga
Explorer

Hello @diogofgm , I was able to figure out that basically the JOBNAME is multivalue filed which contained both the global job XX along with actual jobs AB or DE so i just picked the actual job from the JOBNAME field using mvindex() function and then ran the stats on the actual job . This finally gave me the expected results.

Thank you very much !!

0 Karma

HiroshiSatoh
Champion

I do n’t know what I want to do, but just the result

| transaction JOBNAME JOBID keepevicted=1 startswith="*USERID SYSTEM*" endswith="*FAILD*"
| dedup JOBNAME
| eval JOB = substr(JOBNAME,1,2)
| stats count as failures by JOB
0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!