Getting Data In

Best methods for handling large events and multi-line parsing issue?

ekrieser
Engager

This is two part question that deals with isolating metric data within a multi-line event where the metric identifier strings may be broken down into 2 different sections of a data sheet (log file)

The Log

The log file format includes a date/time stamp at the top of an event generation and procedes to dump information on the application in lines

A single event can exceed 23 k lines and 700 k chars. Here's a log with an isolated single event..

(FYI, this a health.log dump from an HP NNMi management server)

$wc -cl health.log

23518 700343 health.log

Here's a mock up of the problem.


2014-06-17 10:22:13,795 INFO com.hp.ov.nms.health.log NNMi System Health Report
Hostname: somehost.com

Date: 2014-06-17 10:22:11.572

Overall Status: Normal

StatePoller

Collection Manager

Policy Count = 523

....

CustomPoller

Instance Discovery

Collection Manager

Policy Count = 23

...


Part 1) Large Event Issue

I found in a post on the forum that you can modify the inputs.conf file with a 'maxchars' value so the events don't get cut. Is this the best way to handle this, or would it be better to hack up the event? My concern here is that an event is never guaranteed to be the same size and will almost always vary.

Part 2) Multi-line Parsing Issue

As you can see in the snippet above there are preceding headers that identify the metrics I'm trying to extract. These headers are not contiguous and may, or may not have additional sub headers as identified above.

StatePoller → Collection Manager → Policy Count

CustomPoller→ Instance Discovery

              → Collection Manger → Policy Count

I'm trying to understand what the best method for parsing out these different method would be.

Please let me know if I can provide any further detail.. I can send a sample log if needed.

Thanks

Eric

Tags (2)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

You can use line breaks within a field extraction regex, e.g. like this:

CustomPoller[\n\r]+Instance Discovery[\n\r]+Collection Manager[\n\r]+Policy Count = (?<fieldname>\d+)

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

You can use line breaks within a field extraction regex, e.g. like this:

CustomPoller[\n\r]+Instance Discovery[\n\r]+Collection Manager[\n\r]+Policy Count = (?<fieldname>\d+)

martin_mueller
SplunkTrust
SplunkTrust

Great. I've converted this into an answer so you can mark it as solved.

0 Karma

ekrieser
Engager

Thanks Martin. I think that's what I'm looking for.

0 Karma

ekrieser
Engager

The event is packed with hundreds of metrics that would be useful, and most of these metric descriptions are uniquely defined on a single line.. The example I provided is one of the more complex problems I've come across.. I've been able to extract this using a perl parse by capturing the various headers, concatenating them and then testing for a match using a "next unless" expression. I'm just trying to figure out what the best method for doing this type of evaluation might be using Splunk. I'm new to product.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The values for allowing very large events are in props.conf:

TRUNCATE = max length of an event (default 10000)
MAX_EVENTS = max lines in an event (default 256)

As for parsing your data, you'll likely need more or less complex regular expressions.
Do you only need a few values from that large event, or do you need the entire event in Splunk?

0 Karma
Get Updates on the Splunk Community!

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...