Getting Data In

Top ten Apache errors by error message

msacks
Explorer

I am trying to find the top ten Apache errors based on the error message.
Error message or message isn't a default field type, so I'm not sure how I can do this without that.

Do I need to create my own field type for message?

I would think splunk would have this out of the box by now (if I'm not missing it).

Tags (1)

rsennett_splunk
Splunk Employee
Splunk Employee

You are very close... but I think there are either a couple of typos or a few misconceptions regarding field extractions. Don't fret though, because everyone goes through these stages. I totally did. And part of that is the learning curve created by the "automagical" field extractions.

In your example, the regex will only grab the severity level and stop. It also calls the field "fieldname" forever... When you complete the process with the field extractor, it will ask you to name the field and then it will ask you to save. It will add that field to the sourcetype you are working with as an EXTRACT- clause. You can find it in the props.conf file.

So the regex below says:
start at the beginning
just walk past zero or more of NOT a "]": Mon Jul 02 19:37:33 2012

(I use a "+" for that which means one or more of NOT a "]" same diff really here.

now walk past the "]" when you see it: [Mon Jul 02 19:37:33 2012]

now walk past the space after that: [Mon Jul 02 19:37:33 2012] _

and then the next bit declares the <fieldname> which if you were doing it by hand, should be <message>.
The next bit of regex is the instruction of what is going to go in the field.

(?i)^[^\]]*\]\s+(?P<fieldname>[^ ]+)

[^ ] is the equivalent of \S in other words, "not a space" and the + is "one or more, greedy"
but since there is indeed a space right after the right bracket of [error] it's done.
[Mon Jul 02 19:37:33 2012] [error] [client 10.10.1.15] ...

So what you want after the name of the field is just a really gluttonous Kleene star (named for the guy who invented Regular Expressions)which will grab everything to the end.

Also... as you have it now, the "message" field will include the severity since you only skip past the timestamp. so you can either repeat the instructions to walk past the space, and then the severity and another space... or you can do this:

I've got a repeating group that walks past the first two bracketed fields

(?i)^([^\]]+\]\s){2}(?P<message>.*)

What I think you want is this:

index=main sourcetype="error_log-php" 
|rex field=_raw "(?i)^[^\]]+\]\s(?P<severity>[^ ]+)"
|rex field=_raw "(?i)^([^\]]+\]\s){2}(?P<message>.*)"
|top severity message

the "field=_raw" in the rex command is redundant as _raw is the default, but I thought I'd turn you on to that...

So if you wanted these to be "permanent" and always available at search-time.
Find the source type definition for [error_log-php] and add this:

EXTRACT-severity = (?i)^[^\]]+\]\s(?P<severity>[^ ]+)
EXTRACT-message  = (?i)^([^\]]+\]\s){2}(?P<message>.*)

notice... no quotes in the props.conf around the regex

Now... you can just do this:

index=main sourcetype="error_log-php" | top severity message

or you can use the 'deconstructed' version of top... (this is what's going on under the covers) and take advantage of the moving parts. enjoy.

| stats count by severity message
| eventstats sum(count) as totalcount 
| eval percent=count/totalcount*100 
| sort -count| eval rownum=1 
| accum rownum | eval severity=if(rownum>5,"OTHER",severity) 
| eval message=if(rownum>5, "OTHER", message) 
| eval rownum=if(rownum>5,6,rownum)
| stats sum(count) as count sum(percent) as percent by rownum severity message 
| fields - rownum
With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!
0 Karma

msacks
Explorer

What I'm asking is still not being answered, I've seen it before, and I lost the config so I know it's doable. Unfortunately it was a PS engagement who customized it, but if any Splunk employees are listening out there this should be a default option. The extractor doesn't seem to create Apache message fields all that great.

For example, I want to create a new permanent field for everything after the timestamp, which should be a field called message.

[Mon Jul 02 19:37:33 2012] [error] [client 10.10.1.15] PHP Notice:  Undefined index: profileImage in /var/www/html/index.php on line 265

Once this is done, I will be able to run a search such as "search here" | stats count by messageType, or something to that effect.

0 Karma

sowings
Splunk Employee
Splunk Employee

Yes, you can easily do "top 10" based upon the message. This requires that the message is in a field. Splunk may have some of the fields you want, based upon internal "common" log types. For apache, there appear to be a couple like "access_combined" and "apache_error". Unfortunately, it seems that there aren't default field extractions for them, perhaps because Apache logs are so malleable.

It should be pretty easy to break apart your logs into fields, though, given either some regular expressions, or a little bit of configuration change. Have you tried using the interactive field extractor? Search for some of your events from the main search page, then click the dropdown for "extract fields". This may help you get the field you want from the data, so that you can get your top list.

sowings
Splunk Employee
Splunk Employee

While the 'search' command is agnostic about case for field values, it's not true of field NAMES. So if your rex literally has "<fieldname>", you'll want | top limit=20 fieldname.

msacks
Explorer

Seems like I'm not the only one having this problem: http://splunk-base.splunk.com/answers/24699/top-ten-question-re-articulated

It sounds like I have to create a new field type in fields.conf based on my extraction query?

The tricky part though is now that I've grouped things based on the extractor, I am trying to group the message to do a top error.
index=main sourcetype="error_log-php" | head 10000 | rex "(?i)^[^\]]*\]\s+(?P[^ ]+)" | search FIELDNAME="[error]"

When I pipe it to top limit=20, there still isn't a field I can try like messageType.

I tried | top limit=20 FIELDNAME, but that yielded nothing.

I'm looking for a way to find common error message text in the body of the log, and then group them together. Should be a default function for Splunk man, it's time you guys build this in.

msacks
Explorer

Field extractor = win.

Thank you.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...