Getting Data In

VMware ESXi vmkernel error search

Communicator

Hi Splunkers.

A year ago we had a hardware issue that disabled our operation for 24 hours. The VMware vmkernel error looked like this:

2015-11-09T21:55:08.687Z cpu28:37026)MCE: 222: cpu28: bank7: status=0x8c00004000010090: (VAL=1, OVFLW=0, UC=0, EN=0, PCC=0, S=0, AR=0), ECC=no, Addr:0x1425a5200 (valid), Misc:0x42ef6f0000 (valid)

Now that we have Splunk, I am trying to set up a search that would specifically track these errors. I want the date/time, the CPU, and keyword "MCE"

I borrowed and modified a search from the VMware app that looks like this

sourcetype=vmware:esxlog:vmkernel *  * * * * * * | head 10000 | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message (if any)"

My first question is, what are all those * for? I know that an * is a wildcard, but for the VMware app, what does the multiple *'s do?

For the two rex fields, I used the field extractor and extracted the cpu28:37026 part from the above log, but I also want the MCE: part.

My search mostly works. I am getting the time, host, that CPU field, and then a message that doesn't usually contain the MCE errors (or anything useful). How do I make it show time, host, CPU And then either an MCE or MCA error, but only if an MCE or MCA error exists.

Thanks in advance!

0 Karma
1 Solution

SplunkTrust
SplunkTrust

You are right about those stars: I don't know why the extras are in they're but I know they're not needed. Also, the head 20 is defeating the purpose by only showing you the last 20. So here's what I'd suggest:

First, try the simple search

sourcetype=vmware:esxlog:vmkernel (MCE OR MCA OR Error)

That should return the events with MCE, MCA or the word Error in it. Might be precisely what you need, but that also might return a few spurious "error" lines, too.

Another option is to build a new field out of where that shows up, and the only display the events that had it. Here's one way to do it:

sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=*

So that looks for the literal string "cpu" followed by some digits ( cpu\d+ as in cpu28), a colon and more digits ( :\d+ like :65656), a closing parenthesis and then either the string MCE or MCA. You could add |Error as a third option too by making that last piece (?<errcode>MCE|MCA|Error). (The pipe INSIDE the regex is a regex "or".) The last piece | search errcode=* says search for where errcode is set to something. With this search, you should get back all the events with MCA, MCE or Error (assuming you added error!) but nothing else. These are, if I'm right, the events of interest to you.

Once you have the right (hopefully few) events displaying - just the ones with the error - you can start adding the rest of your search back in one piece at a time - they seem reasonably straightforward and a little thinking on them will probably be all you need to understand what they do. But if one's still not obvious to you, ask away!

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

You are right about those stars: I don't know why the extras are in they're but I know they're not needed. Also, the head 20 is defeating the purpose by only showing you the last 20. So here's what I'd suggest:

First, try the simple search

sourcetype=vmware:esxlog:vmkernel (MCE OR MCA OR Error)

That should return the events with MCE, MCA or the word Error in it. Might be precisely what you need, but that also might return a few spurious "error" lines, too.

Another option is to build a new field out of where that shows up, and the only display the events that had it. Here's one way to do it:

sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=*

So that looks for the literal string "cpu" followed by some digits ( cpu\d+ as in cpu28), a colon and more digits ( :\d+ like :65656), a closing parenthesis and then either the string MCE or MCA. You could add |Error as a third option too by making that last piece (?<errcode>MCE|MCA|Error). (The pipe INSIDE the regex is a regex "or".) The last piece | search errcode=* says search for where errcode is set to something. With this search, you should get back all the events with MCA, MCE or Error (assuming you added error!) but nothing else. These are, if I'm right, the events of interest to you.

Once you have the right (hopefully few) events displaying - just the ones with the error - you can start adding the rest of your search back in one piece at a time - they seem reasonably straightforward and a little thinking on them will probably be all you need to understand what they do. But if one's still not obvious to you, ask away!

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

By the way, assuming all the rest of the search is needed (I don't think the two "rex" statements you have are, but there's no way I can know that for sure), your whole search would be

sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=* | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message" 
0 Karma

SplunkTrust
SplunkTrust

Oh, and if you don't need the "sublogger" field for this nor the various CPU and CPU_Message fields, you can skip all that.

sourcetype=vmware:esxlog:vmkernel | rex "cpu\d+:\d+\)(?<errcode>MCE|MCA)" | search errcode=* | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message"
0 Karma

Communicator

Thank you so much for your thoughtful and complete answer!

0 Karma

Communicator

Okay, I solved the first part of my problem. Just needed to add | where Message NOT null.

Here's my current search string

sourcetype=vmware:esxlog:vmkernel * * * * * * * | head 20 | rex field=sourcetype "^vmware:esxlog:(?<sublogger>.+)$" | rex field=Message "^(?:[^ \n]* ){7}(?P<CPU>[^\)]+)\)(?P<CPU_Message>.+)" | eval Time=_time | convert ctime(Time) | table Time, host, CPU, CPU_Message | Rename host as Host, CPU_Message as "Message" | where Message NOT null

Now, I only want to display something if there are certain keywords in the Message field, like "MCE", "MCA" and "Error". I am not sure how to do that.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!