Alerting

In Splunk, is there a way to set up an email alert for when scheduled jobs have error messages?

rrobe07
New Member

We have some scheduled jobs that I recently noticed on the Jobs page have error messages ("max_mem_usage_mb has been reached" in our case). I wasn't aware that these searches were not producing the correct results due to running out of memory. Is there a way to set up an email alert to be notified when scheduled jobs have error messages? I'm able to find the messages in var/run/splunk/dispatch, but that data doesn't appear to be searchable (like in _internal for instance) in which case I could set up a scheduled search to detect these occurrences. In the absence of the error messages being searchable, how can we be notified?

Also, I am able to find the job run in index=_internal (sourcetype = scheduler), but the entry says "status=success" even though the Job page lists an error.

0 Karma

ragedsparrow
Contributor

I think this usually comes up in the splunkd.log:

02-15-2019 09:56:05.815 ERROR StatsProcessor - Reached limit max_mem_usage_mb (200 MB), results may be incomplete! Please increase the max_mem_usage_mb in limits.conf .

You may be able to build an alert using something like this as a base search:

index=_internal sourcetype=splunkd component=StatsProcessor log_level=ERROR max_mem_usage_mb

I haven't encountered this error, so this is just my best guess here.

0 Karma

rrobe07
New Member

Not even this basic search returns anything related to the job that failed:
index=_internal max_mem_usage_mb

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...