Hi Splunkers.
A year ago we had a hardware issue that disabled our operation for 24 hours. The VMware vmkernel error looked like this:
2015-11-09T21:55:08.687Z cpu28:37026)MCE: 222: cpu28: bank7: status=0x8c00004000010090: (VAL=1, OVFLW=0, UC=0, EN=0, PCC=0, S=0, AR=0), ECC=no, Addr:0x1425a5200 (valid), Misc:0x42ef6f0000 (valid)
Now that we have Splunk, I am trying to set up a search that would specifically track these errors. I want the date/time, the CPU, and keyword "MCE"
I borrowed and modified a search from the VMware app that looks like this
sourcetype=vmware:esxlog:vmkernel * * * * * * * | head 10000 | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message (if any)"
My first question is, what are all those * for? I know that an * is a wildcard, but for the VMware app, what does the multiple *'s do?
For the two rex fields, I used the field extractor and extracted the cpu28:37026
part from the above log, but I also want the MCE:
part.
My search mostly works. I am getting the time, host, that CPU field, and then a message that doesn't usually contain the MCE errors (or anything useful). How do I make it show time, host, CPU
And then either an MCE or MCA error, but only if an MCE or MCA error exists.
Thanks in advance!
You are right about those stars: I don't know why the extras are in they're but I know they're not needed. Also, the head 20
is defeating the purpose by only showing you the last 20. So here's what I'd suggest:
First, try the simple search
sourcetype=vmware:esxlog:vmkernel (MCE OR MCA OR Error)
That should return the events with MCE, MCA or the word Error in it. Might be precisely what you need, but that also might return a few spurious "error" lines, too.
Another option is to build a new field out of where that shows up, and the only display the events that had it. Here's one way to do it:
sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=*
So that looks for the literal string "cpu" followed by some digits ( cpu\d+
as in cpu28), a colon and more digits ( :\d+
like :65656), a closing parenthesis and then either the string MCE or MCA. You could add |Error
as a third option too by making that last piece (?<errcode>MCE|MCA|Error)
. (The pipe INSIDE the regex is a regex "or".) The last piece | search errcode=*
says search for where errcode is set to something. With this search, you should get back all the events with MCA, MCE or Error (assuming you added error!) but nothing else. These are, if I'm right, the events of interest to you.
Once you have the right (hopefully few) events displaying - just the ones with the error - you can start adding the rest of your search back in one piece at a time - they seem reasonably straightforward and a little thinking on them will probably be all you need to understand what they do. But if one's still not obvious to you, ask away!
You are right about those stars: I don't know why the extras are in they're but I know they're not needed. Also, the head 20
is defeating the purpose by only showing you the last 20. So here's what I'd suggest:
First, try the simple search
sourcetype=vmware:esxlog:vmkernel (MCE OR MCA OR Error)
That should return the events with MCE, MCA or the word Error in it. Might be precisely what you need, but that also might return a few spurious "error" lines, too.
Another option is to build a new field out of where that shows up, and the only display the events that had it. Here's one way to do it:
sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=*
So that looks for the literal string "cpu" followed by some digits ( cpu\d+
as in cpu28), a colon and more digits ( :\d+
like :65656), a closing parenthesis and then either the string MCE or MCA. You could add |Error
as a third option too by making that last piece (?<errcode>MCE|MCA|Error)
. (The pipe INSIDE the regex is a regex "or".) The last piece | search errcode=*
says search for where errcode is set to something. With this search, you should get back all the events with MCA, MCE or Error (assuming you added error!) but nothing else. These are, if I'm right, the events of interest to you.
Once you have the right (hopefully few) events displaying - just the ones with the error - you can start adding the rest of your search back in one piece at a time - they seem reasonably straightforward and a little thinking on them will probably be all you need to understand what they do. But if one's still not obvious to you, ask away!
By the way, assuming all the rest of the search is needed (I don't think the two "rex" statements you have are, but there's no way I can know that for sure), your whole search would be
sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=* | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message"
Oh, and if you don't need the "sublogger" field for this nor the various CPU and CPU_Message fields, you can skip all that.
sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=* | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message"
Thank you so much for your thoughtful and complete answer!
Okay, I solved the first part of my problem. Just needed to add | where Message NOT null
.
Here's my current search string
sourcetype=vmware:esxlog:vmkernel * * * * * * * | head 20 | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message" | where Message NOT null
Now, I only want to display something if there are certain keywords in the Message field, like "MCE", "MCA" and "Error". I am not sure how to do that.