Attempting to follow the example on the Splunk doc site, I set up an index-time field extraction (called "topten") to parse and count specific errors. It doesn't appear to be working, because performing a search on a log piped to "timechart count by topten" produces data that includes "INFO," "WARN," and others, contrary to what my regex specifies. I only want "ERROR"s reported. Something is clearly wrong, probably with my regex, although it works perfectly in Regex Coach. I'm not convinced I fully understand the example on the Splunk doc site; I'm probably failing to grasp a critical concept.
I'm also having a problem with the report. It appears to group and graph entries that have been counted, but a mouse-over on the bars shows "NULL" and a count of whatever it was that was counted (INFO, WARN, ERROR). What I want is for a mouse-over to produce the text from the log the regex matched, and the count. My index-time field extraction config is below, if someone can steer me in the right direction I'd appreciate it.
transforms.conf:
[topten]
REGEX = ERROR.*?\n
FORMAT = topten::"$1"
WRITE_META = true
fields.conf:
[topten]
INDEXED = true
props.conf:
[log4jlog]
TRANSFORMS-topten = topten
It's difficult to provide help, without sample data. If you're confident the RegEx should work, then the problem looks to be:
1. In the FORMAT clause, remove the quotes surrounding $1. It's a variable, not a string -- it references the first capture buffer in the RegEx, stuff surrounded by ().
2. The portion of your REGEX clause that you would like to capture in $1 needs to be buffered, so enclose the portion of the RegEx that contains the data you want, like this:
ERROR(.*?)\n
This would capture anything between ERROR and newline into topten.
On a side note, is there a reason you are doing this at index time? I would recommend using a search-time extraction, like this, and scrap your fields.conf and transforms.conf entries:
props.conf:
[log4jlog]
EXTRACT-topten = ERROR(?<topten>.*?)\n
Cheers
Thanks for your response, Ron. I used index time because a response to my original question at:
http://splunk-base.splunk.com/answers/24165/how-to-report-top-ten-errors-over-a-time-range
suggested it was the only way to accomplish my goal. What I want to do is search through a log4j formatted file, gather all ERRORs, sort them by type (based on error text), count the instances of each error type, and return one example each of the 10 errors that repeat the most (top ten). Both index-time and search-time suggestions are returning log entries other than ERRORs, which isn't what I'm after.
One more note: an index-time extraction will not apply to data that has already been indexed; it will only operate on new data being indexed after the extraction is defined. It will also require you to restart Splunk to activate it. IMO, use a search-time extraction. Search-time changes to props.conf can be loaded using this search syntax: "| extract reload=t"
Another side note: you can test a search-time extraction in the UI using the REX command. If you search, "* | rex "ERROR(?