Reporting

Top Ten question re-articulated

mwlarsen
Explorer

Say I have multiple sources of jboss logs, like server.log, geo.log, feature.log, and gzipped archives containing earlier versions of those logs. Each log contains entries in log4j format. I want to parse all the logs collectively and return only the log entries that contain the word "ERROR," up to the first newline. I want to exclude all log entries that contain the words "INFO," "WARN," "DEBUG," etc. In the example log entries below, using Regex Coach, the regex -

ERROR.*?\n

matches the portion of text I want to extract from the errors. Additionally, the regex doesn't match any log entry I want excluded from the search:

2011-05-10 02:01:11,799 [ThreadPool worker thread #21] ERROR com.geodyne.overt.runtime.engine.FlowObjectExecutionTreeNode - deliverException(...)com.geodyne.magma.GeoException: Runtime error in script ("Process: 'GeoDoc GeoCode LAVA Cache' ProcessItem: 'Get GeoCode Location' Type: 'ITEM'" 18:0).Internal Script error: com.geodyneinc.magma.common.util.service.exceptions.BaseSystemException: Error while calling calderaLocation or parsing the response from GeoCode because of HTML response

[ErrorInfo[

featureId=null

featureNumber=null

featureMapId=null

errorType=RECOVERABLE

externalErrorCode=null

message=Error while calling calderaLocation or parsing the response from GeoCode because of
HTML response

serviceName=GeoGen

severityType=null

timeStamp=2011-05-09 23:16:18.203

stackTrace=null
]]

2011-05-10 02:01:56,360 [ThreadPool worker thread #22] ERROR com.geodyne.server.ejb.workflow.EJBWorkflowManagerBean - Exception occurred, e = com.geodyne.component.common.workflow.WorkflowProcessItemException: Runtime error in script ("Process: 'EruptionEvasionRetryService' ProcessItem: 'call connector' Type: 'ITEM'" 4:0).Internal Script error: com.geodyneinc.magma.common.util.service.exceptions.BaseSystemException: Error in persisting certificate images into GeoCode 46555calderaLocation: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

<HTML><HEAD><TITLE>You are not authorized to view this page</TITLE>

<META HTTP-EQUIV="Content-Type" Content="text/html; charset=Windows-1252">

<STYLE type="text/css">

BODY { font: 8pt/12pt verdana }

H1 { font: 13pt/15pt verdana }

H2 { font: 8pt/12pt verdana }

A:link { color: red }

A:visited { color: maroon }

</STYLE>

</HEAD><BODY><TABLE width=500 border=0 cellspacing=10><TR><TD>

<h1>You are not authorized to view this page</h1>
You do not have permission to view this directory or page using the credentials that you supplied.

<hr>

<p>Please try the following:</p>

<ul>

2011-05-10 02:02:44,865 [ThreadPool worker thread #25] ERROR com.geodyne.magma.script.js.GeoJavaScriptException - GeoJavaScriptException(), nested exception:

2011-05-10 02:02:44,867 [ThreadPool worker thread #25] ERROR com.geodyne.server.ejb.workflow.EJBWorkflowManagerBean - Message: server.ejb.workflow.impl.EJBWorkflowManagerBean.exception Arguments: ExecutionStack(ExecutionJob(worker(componentName = Script), processItemId = 6071, processTiming = N, saveExecutionContextBehaviour = EXECUTION_CONTEXT_DO_NOT_SAVE)), SymbolTable(SymbolTable(...)), sharedData = null

Now say that I want to categorize all errors with matching text to be of a particular "type." For example, multiple log entries with the text -

2011-05-10 02:01:11,799 [ThreadPool worker thread #21] ERROR com.geodyne.overt.runtime.engine.FlowObjectExecutionTreeNode - deliverException(...)com.geodyne.magma.GeoException: Runtime error in script ("Process: 'GeoDoc GeoCode LAVA Cache' ProcessItem: 'Get GeoCode Location' Type: 'ITEM'" 18:0).Internal Script error: com.geodyneinc.magma.common.util.service.exceptions.BaseSystemException: Error while calling calderaLocation or parsing the response from GeoCode because of HTML response

are a "type." Say there are 30 odd "types" of errors in the logs. I want to count how many errors of each "type" there are for a given time frame, and return the text of the first instance in the log of the 10 most frequently seen error "types", with a count of occurances of each. For example, say there are 200 of the first type, 40 of the second, 12 of the third, and so on. Now, I'd like to graph that information, with a bar for each "type" that shows the error count and error text on mouse-over. Is there a way to do that? I've tried to do it with search- and index-field extraction, suggestions given by kind list participants (links to original questions/answers below), but I can't make them work - no doubt because I've failed to articulate my problem and desired outcome adequately.

http://splunk-base.splunk.com/answers/24165/how-to-report-top-ten-errors-over-a-time-range

http://splunk-base.splunk.com/answers/24268/index-time-field-extraction-and-report-output-problems

My intent is to make a dashboard that has fields for entering the desired start date/timestamp, stop date/timestamp and a "Top Ten" button that when clicked, produces the report/graph directly without any further interaction from the user. Sources are the current individual logs and multiple gzipped archives of older logs. Any pointers would be appreciated.

Tags (2)

sideview
SplunkTrust
SplunkTrust

I think you want to look a a couple different areas.

For one thing, a simple search for the term "ERROR" will make any search here much more efficient. You dont want to dive into regexes immediately with nothing else cause Splunk will have to get every single event off disk to check it against the regex.
You probably want to throw a sourcetype term in there, just because the word 'error' probably appears elsewhere. I'm also noticing that your events, despite the wrapping above, are actually single line events. Which means your regex doesn't really do anything that a simple search for 'ERROR' wouldnt do.

However I could well be misunderstanding and you may want to filter those results, possibly by the regex term that you have there, possibly by some other searchterms like ThreadPool , or geodyne.

sourcetype=<your sourcetype> ERROR | regex _raw="<perhaps some regex>"

OK. Now you have lots of events in a search results set. There are several ways to differentiate them.

A) If 'type' can be interpreted to mean a unique combination of one or two "fields" in the data, and you're able to extract those fields reliably from all events...

for example yo might define type as the unique combination of the class throwing the exception, and the class of the exception, and extract a "class" field whose values are like "com.geodyne.server.ejb.workflow.EJBWorkflowManagerBean", and an exceptionClass field whose values are like "server.ejb.workflow.impl.EJBWorkflowManagerBean.exception"

Then it's as simple as:

sourcetype=<your sourcetype> ERROR | regex _raw="<perhaps some regex>" | stats count by class, exceptionClass

If on the other hand the matching criteria are a lot more subtle, then you might want to look at eventtypes instead.

http://www.splunk.com/base/Documentation/4.2.1/Knowledge/Abouteventtypes

Eventtypes are a little mindbending, but they allow you to do extremely sophisticated matching where matching on simple field values and aggregation on unique combinations of field values wont work. Beware that nothing prevents eventtypes from overlapping with eachother; in fact this is one of their strengths. In any cases where you're trying to get distinct sets though, it can be a liability.

0 Karma

sideview
SplunkTrust
SplunkTrust

But stats count by class, exceptionClass doesnt only list out counts, it lists out the unique combinations of class and exceptionClass. I hope this makes sense. This is just the beginning but it's the key to understanding Splunk reporting.

0 Karma

mwlarsen
Explorer

...so they wouldn't be useful in a count of error types. I'm trying to match for text that's common to repeating errors, and assign that text to a type, then count how many of each type there is over a given time range.

0 Karma

mwlarsen
Explorer

Thanks, Nick. A type is all the matched text. The 1st error matches:

2011-05-10 02:01:11,799 [ThreadPool worker thread #21] ERROR com.geodyne.overt.runtime.engine.FlowObjectExecutionTreeNode - deliverException(...)com.geodyne.magma.GeoException: Runtime error in script ("Process: 'GeoDoc GeoCode LAVA Cache' ProcessItem: 'Get GeoCode Location' Type: 'ITEM'" 18:0).Internal Script error: com.geodyneinc.magma.common.util.service.exceptions.BaseSystemException: Error while calling calderaLocation or parsing the response from GeoCode because of HTML response

The details below that are unique...

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!