This is two part question that deals with isolating metric data within a multi-line event where the metric identifier strings may be broken down into 2 different sections of a data sheet (log file)
The log file format includes a date/time stamp at the top of an event generation and procedes to dump information on the application in lines
A single event can exceed 23 k lines and 700 k chars. Here's a log with an isolated single event..
(FYI, this a health.log dump from an HP NNMi management server)
$wc -cl health.log
23518 700343 health.log
Here's a mock up of the problem.
2014-06-17 10:22:13,795 INFO com.hp.ov.nms.health.log NNMi System Health Report
Date: 2014-06-17 10:22:11.572
Overall Status: Normal
Policy Count = 523
Policy Count = 23
Part 1) Large Event Issue
I found in a post on the forum that you can modify the inputs.conf file with a 'maxchars' value so the events don't get cut. Is this the best way to handle this, or would it be better to hack up the event? My concern here is that an event is never guaranteed to be the same size and will almost always vary.
Part 2) Multi-line Parsing Issue
As you can see in the snippet above there are preceding headers that identify the metrics I'm trying to extract. These headers are not contiguous and may, or may not have additional sub headers as identified above.
StatePoller → Collection Manager → Policy Count
CustomPoller→ Instance Discovery
→ Collection Manger → Policy Count
I'm trying to understand what the best method for parsing out these different method would be.
Please let me know if I can provide any further detail.. I can send a sample log if needed.
The values for allowing very large events are in props.conf:
TRUNCATE = max length of an event (default 10000) MAX_EVENTS = max lines in an event (default 256)
As for parsing your data, you'll likely need more or less complex regular expressions.
Do you only need a few values from that large event, or do you need the entire event in Splunk?
The event is packed with hundreds of metrics that would be useful, and most of these metric descriptions are uniquely defined on a single line.. The example I provided is one of the more complex problems I've come across.. I've been able to extract this using a perl parse by capturing the various headers, concatenating them and then testing for a match using a "next unless" expression. I'm just trying to figure out what the best method for doing this type of evaluation might be using Splunk. I'm new to product.
You can use line breaks within a field extraction regex, e.g. like this:
CustomPoller[\n\r]+Instance Discovery[\n\r]+Collection Manager[\n\r]+Policy Count = (?<fieldname>\d+)