Filter results before regex is applied

tven7 · ‎02-01-2011

I have an application log with a lot of entries.

I want to be able to get only the lines with the pattern "Exception:"

some examples of lines in the log file are

case1: java.text.ParseException: Unparseable date:

case2: com.pp.xyz.services.exception.UserException: Expected one record with user ID

The following does not work

source="/home/xyz.log" "Exception:"

But doing the following matches case 2

source="/home/xyz.log" "Exception"

Couple of questions regarding this.

Splunk ignores case of the search term provided in this case "Exception" and matches it against "exception" ?

Splunk does not match partial patterns, which should have matched case1 when i searched for "Exception:" ? Why is this ?

How you get the initial search to match against pattern "Exception:" ?

If I can get that to work then i would want to do something like below for the full solution, which is to capture all Exceptions

source="/home/xyz.log" "Exception:"|rex "\w+\.(?<exception>.\w+Exception).*?\n"|timechart count by exception usenull=f

Lowell · ‎02-02-2011

The previous answers are right, but I'd like to point out that searching with a leading wildcard is much less efficient than having a wildcard on the suffix. In other words, looking for "Blah*" is pretty quick because splunk can do an efficient lookup to say find terms start with "Blah". Whereas, searching for "*Blah", splunk must scan all terms looking for ones that ends with "Blah". This type of index lookup will always take longer, but you may or may not notice; that's going to depend on how many unique terms your index contains.

So my suggestion would be to build a list of all possible exceptions types and put them into a big "OR" list:

Step 1: Figure out how may different "*Exception" patterns you really have in your data. (you may want to search over a long time period to make sure you don't miss any.)

source="/home/xyz.log" *Exception: | regex "\.(?<exception>\w+Exception:)" | dedup exception

Step 2: Take that list of terms and combine them into your original search, something like this:

source="/home/xyz.log" (ParseException: OR UserException: OR BlahException: OR ...)

Assuming you don't have all that many exception types, you should end up with a faster search.

You'll also have to ask yourself: How often do new exception types show up? Which is preferable? (1) good performance with the possibly of missing events when new exception types show up, or (2) never missing events, but having a slower search.

There's a helpful video about segmentation here:

Paolo_Prigione · ‎02-01-2011

1) Yes, Splunk search is case insensitive concerning indexed terms. However, boolean operators (AND, OR, NOT) MUST be written uppercase, field names MUST be written exactly as they appear

2) Splunk matches partial patterns if you put an asterisk into them (as gkanapathy said). The column in "Exception:" is considered a "segmenter" i.e. something breaking up words. But you should be able to get results for 3)

3) "*Exception:" should do

gkanapathy · ‎02-01-2011

Use:

source=home/xyz.log *Exception

tven7 · ‎02-02-2011

The performance of this is really bad, just going back 4 hours which is not a lot of data (< 300 mb). I guess the I/O is to blame, with nothing else on contendign for resources on the server. Previously i was doing this. "Exception" NOT XYZPAttern and this was performing well, but was skipping some patterns in case 1.

Thank you for the help

jrodman · ‎02-01-2011

Perhaps one might prefer "*Exception:"

Filter results before regex is applied

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor

Are you a member of the Splunk Community?

Filter results before regex is applied

AppDynamics Summer Webinars

SOCin’ it to you at Splunk University

Credit Card Data Protection & PCI Compliance with Splunk Edge Processor