Alerting

Create Splunk alerts for cron jobs

gpunjabi
New Member

I want to create a Splunk alert for Cron job it will trigger an alert when cron job is not successful or not ran? Any one can help me on this.

Tags (1)
0 Karma

woodcock
Esteemed Legend

There are 2 ways:

1: add >> /tmp/someSnesibleFileName.txt 2>&1 to your cron string and then monitor that file

2: convert your job from OS cron to splunk scripted input using the same cron schedule string. This automatically forwards the output to Splunk, but then if Splunk is not running, your job does not get run either.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

First you need to index your cron logs into Splunk. Then you can run a search for the name of the job that has to run. Schedule the search to run a couple of minutes after the cron job is expected to complete. If the search returns no results, trigger an alert.

---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

Hello Richgalloway,
I have one scenario mentioning below if you can help me on this.
have required like this Create the Splunk alert for following jobs after deployment to UAT and Production environment to ensure we have the alerts present in system for each jobs.
Make them active in UAT to test if they trigger on correct condition.
On production they can be turned ON when market starts using these jobs.

Cron Job is full-wcpIndexAT-cronjob- that runs 3:05 AM Every Day

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Are you indexing the cron log (/var/log/cron)? Does it contain an event when full-wcpIndexAT-cronjob- completes? If the answer to both of those questions is yes, then you can schedule a search to run at 3:10 am every day that searches for the job completion string. If the string is not found, send an alert.

---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

Hello Rich,

Thank you for your answer, We wan to trigger an alarm when Job gets failed, We have already configure this in our aws cloudwatch monitoring need to integrate with splunk now.

0 Karma

gpunjabi
New Member

Hi Rich, I have written a query for a single cron job but I want to use regex and make this query so that I can run for others jobs also.
sourcetype=hybris_console "full-wcpIndexCA-cronjob" earliest=-80m | rex field=_raw "[(?\w+-\w+-\w+)::" | eval Period=if(_time>=relative_time(now(),"-24h"),1,2) | stats min(Period) as periodMin max(Period) as periodMax by cronType | eval stopped = if(periodMin=2,"true", "false") | eval restarted = if(stopped="false" AND periodMax=1,"true", "false") | where stopped="true" OR restarted="true" | table cronType stopped restarted.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If the event text is similar for each cron job, this should work. I wonder only about the "full-wcpIndexCA-cronjob" part - is that specific to one job?

---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

Hi Rich, any how my comment is not showing when I am trying to post my comment, apologies if your are receiving the notification.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Your comment was sent for moderation since your Karma score is still low.

---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

Yes Rich it specific to CA cron job we have multiple cron jobs
1.i.e full-wcpIndexDE-cronjob
full-wcpIndexBE-cronjob
full-wcpIndexIT-cronjob
full-wcpIndexSE-cronjob
full-wcpIndexPT-cronjob
full-wcpIndexNO-cronjob

2.Also I want to check if the alarm run in the last 24 h in this query.

  1. I have tried replacing full-wcpIndex[A-Z]{2}-cronjob in the query this seems to be not working. Can you please help me with the above points.
0 Karma

richgalloway
SplunkTrust
SplunkTrust
  1. Try sourcetype=hybris_console "full-wcpindex*-cronjob" earliest=-80m | rex field=_raw "[(?\w+-\w+-\w+)\::" | eval Period=if(_time>=relative_time(now(),"-24h"),1,2) | stats min(Period) as periodMin max(Period) as periodMax by cronType | eval stopped = if(periodMin=2,"true", "false") | eval restarted = if(stopped="false" AND periodMax=1,"true", "false") | where stopped="true" OR restarted="true" | table cronType stopped restarted. You will need to fix the rex command as it was garbled in your comment.
  2. One cannot use regex meta-characters in a base search, only in certain command like rex, regex, replace, and match.
---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

It is throwing an Error in 'rex' command: Encountered the following error while compiling the regex '[(?\w+-\w+-\w+)::': Regex: invalid range in character class in I think "\[(?<cronType>\w+\-\w+\-\w+)\::" this will work. Sorry I forgot to add the regex part in the mail it was actually there, Is it fine now Rich?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Does it work?

---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

Yeah Rich, the query works and it is showing the cronjobs that are present in the log, but I want to make an alarm from this now. but I am not sure whether it is an failure result or not.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Once you have your search for cron jobs, add a where clause for the number of expected results. ... | stats count | where count < x. Trigger the alarm if the number of results is not zero.
You could also compare the list of jobs to an expected list in a lookup file. If any jobs are missing you'll have results. Trigger the alarm if there are any.

---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

Hi Rich, My earlier query seems to be complex .I need to simplfy query as much as I can , so I need to trigger my search as an event. I can trigger this event at 7:00 AM So that it can show failure (Importantly ) of all cronjobs . Usually all Jobs start from 2:45 AM and goes till 5:55 AM after all jobs are completed events get triggered and I can see the stats.

sourcetype=hybris_console "full-wcpIndexCA-cronjob" earliest=-80m | rex field=_raw "\[(?<cronType>\w+\-\w+\-\w+)\::" | eval Period=if(_time>=relative_time(now(),"-24h"),1,2) | stats min(Period) as periodMin max(Period) as periodMax by cronType | eval stopped = if(periodMin=2,"true", "false") | eval restarted = if(stopped="false" AND periodMax=1,"true", "false") | where stopped="true" OR restarted="true" | table cronType stopped restarted

if you can help me in simplyfying my query with the above condition

Also I didn't get your previous comment solution.

Sample Failure event

3/2/19
4:25:07.538 AM

INFO | jvm 1 | main | 2019/03/02 04:25:07.538 | WARN full-wcpIndexFR-cronjob::de.hybris.platform.servicelayer.internal.jalo.ServicelayerJob [SolrIndexerJob] Error during indexer call: frIndex
host = uat_hybris_system_0 source = /opt/sap/hybris/log/tomcat/console-20190302.log

Image url- [1]: https://storage.googleapis.com/splunk-alert/Screen%20Shot%202019-03-02%20at%209.48.27%20PM.png

I would request you to kindly help me here and will wait for your response.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The best way to improve your query is to add an index specification before the first |.

Using the resulting query, you can save it as an alert and have the alert trigger when the number of results is fewer than the expected number of cron jobs.
(You can disregard my last comment.)

---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

Okay, Rich , I have one more Idea, I am not sure whether It will work or not, if we can can catch aborted job Error Message from _raw feel and use error message as an alert. Will it be a right approach, I am not sure, I have written one sample query with error message filed in the search field and use as an alert for all cron jobs.

Also the above query is only for CA job, I want that it can work for all cron jobs, the above query I checked with full-wcpIndex*-cronjob in the old query it not showing a proper output.

The query is
index=leap_uat_logs sourcetype=hybris_console "full-wcpIndex*-cronjob" "Error during indexer call: Index" NOT debug source=/opt/sap/hybris/log/tomcat/console

The sample result.

3/2/19
4:25:07.538 AM
INFO | jvm 1 | main | 2019/03/02 04:25:07.538 | WARN full-wcpIndexFR-cronjob::de.hybris.platform.servicelayer.internal.jalo.ServicelayerJob [SolrIndexerJob] Error during indexer call: frIndex
host = uat_hybris_system_0 source = /opt/sap/hybris/log/tomcat/console-20190302.log

Thanks
Gaurav

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If the query produces the desired results, then it works. I'm not sure what I can add.
Your earlier queries were quite specific about the contents because of the rex command. A more general query will tend to find events that do not match the regular expression in the rex.

---
If this reply helps you, Karma would be appreciated.
0 Karma

gpunjabi
New Member

So is this correct?, I have taken error message from _raw field and using in search and creating an alert. I have sent you can sample image as well. Can't we refine it anymore.

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...