Hi @jcorcorans I haven't discovered any great way to parse the chef-client.log A few things that can help 1) look for the log_level when it isn't INFO/WARN [2025-02-24T19:06:07+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report 2) for log rotate, I see we have directives in /etc/logrotate.d/chef-cilent "/var/log/chef/client.log" {
weekly
rotate 12
compress
postrotate
systemctl reload chef-client.service >/dev/null || :
endscript
} 3) and if you have a number of servers and you are running chef a lot and want to know when to truly spend time debugging since we find a chef operation can fail due to timeout or load, you check over a time period and see if in the end things are running okay. So we have something like this: if after 3 times chef run is still not good then investigate idx=your_index sourcetype=chef:client ("FATAL: Chef::Exceptions::ChildConvergeError:" OR "FATAL: Chef::Exceptions::ValidationFailed" OR "Chef run process exited unsuccessfully" OR "INFO: Chef Run complete" OR "INFO: Report handlers complete")
| eval chef_status=if(searchmatch("ERROR") OR searchmatch("FATAL"), "failed", "succeeded")
| stats count(eval(chef_status="failed")) AS num_failed, count(eval(chef_status="succeeded")) AS num_succeeded,latest(chef_status) as latest_chef_status by host
| search num_failed > 3 AND latest_chef_status!="succeede To monitor the logs, a simple monitoring stanza in your inputs [monitor:///var/log/chef/client.log]
sourcetype=yourchefsourcetype
index=your_index
... View more