I am looking to monitor Disk IO error, is there any way to monitor it..
Currently we have filtered disk related hardware error message like below format and same output is redirected into splunk readable log file. we are monitoring error message if that log file contains “I/O error” string.
Command which we used to convert hardware message into splunk readable format:
# dmesg -L -T| grep -iE "I/O error"|tr -d '['| awk -F']' '{print $1 "," $2}'
Thu Oct 1 00:01:00 2020, blk_update_request: I/O error, dev fd0, sector 0
Fri Oct 2 00:01:00 2020, blk_update_request: I/O error, dev fd0, sector 0
Fri Oct 2 00:01:00 2020, blk_update_request: I/O error, dev fd0, sector 0
But this is not the feasible way to monitor as this command don't work on all linux version, so is there any default app available to monitor Disk I/O error.
Thu Oct 1 00:01:00 2020, blk_update_request: I/O error, dev fd0, sector 0
Fri Oct 2 00:01:00 2020, blk_update_request: I/O error, dev fd0, sector 0
Fri Oct 2 00:01:00 2020, blk_update_request: I/O error, dev fd0, sector 0
this is the output, currently my script works only with redhat 7, centos 7, I need a common way to monitor on all versions.
Which versions of Linux are you using?
redhat 7, centos 7,redhat 6,centos 6,suse linux 12,ubuntu 16,amazon ami linux
Does the output of dmesg look the same on all systems?
Does grep work the same way with those options on all systems?
You need to look at what you script is doing at each stage to figure out what is different with the failing systems compared to the working systems.
Dmesg command itself don't work on all flavours of linux
Do you have an example of the output from dmesg from each of these?