Getting Data In

Best methods for handling large events and multi-line parsing issue?

ekrieser
Engager

This is two part question that deals with isolating metric data within a multi-line event where the metric identifier strings may be broken down into 2 different sections of a data sheet (log file)

The Log

The log file format includes a date/time stamp at the top of an event generation and procedes to dump information on the application in lines

A single event can exceed 23 k lines and 700 k chars. Here's a log with an isolated single event..

(FYI, this a health.log dump from an HP NNMi management server)

$wc -cl health.log

23518 700343 health.log

Here's a mock up of the problem.


2014-06-17 10:22:13,795 INFO com.hp.ov.nms.health.log NNMi System Health Report
Hostname: somehost.com

Date: 2014-06-17 10:22:11.572

Overall Status: Normal

StatePoller

Collection Manager

Policy Count = 523

....

CustomPoller

Instance Discovery

Collection Manager

Policy Count = 23

...


Part 1) Large Event Issue

I found in a post on the forum that you can modify the inputs.conf file with a 'maxchars' value so the events don't get cut. Is this the best way to handle this, or would it be better to hack up the event? My concern here is that an event is never guaranteed to be the same size and will almost always vary.

Part 2) Multi-line Parsing Issue

As you can see in the snippet above there are preceding headers that identify the metrics I'm trying to extract. These headers are not contiguous and may, or may not have additional sub headers as identified above.

StatePoller → Collection Manager → Policy Count

CustomPoller→ Instance Discovery

              → Collection Manger → Policy Count

I'm trying to understand what the best method for parsing out these different method would be.

Please let me know if I can provide any further detail.. I can send a sample log if needed.

Thanks

Eric

Tags (2)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

You can use line breaks within a field extraction regex, e.g. like this:

CustomPoller[\n\r]+Instance Discovery[\n\r]+Collection Manager[\n\r]+Policy Count = (?<fieldname>\d+)

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

You can use line breaks within a field extraction regex, e.g. like this:

CustomPoller[\n\r]+Instance Discovery[\n\r]+Collection Manager[\n\r]+Policy Count = (?<fieldname>\d+)

martin_mueller
SplunkTrust
SplunkTrust

Great. I've converted this into an answer so you can mark it as solved.

0 Karma

ekrieser
Engager

Thanks Martin. I think that's what I'm looking for.

0 Karma

ekrieser
Engager

The event is packed with hundreds of metrics that would be useful, and most of these metric descriptions are uniquely defined on a single line.. The example I provided is one of the more complex problems I've come across.. I've been able to extract this using a perl parse by capturing the various headers, concatenating them and then testing for a match using a "next unless" expression. I'm just trying to figure out what the best method for doing this type of evaluation might be using Splunk. I'm new to product.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The values for allowing very large events are in props.conf:

TRUNCATE = max length of an event (default 10000)
MAX_EVENTS = max lines in an event (default 256)

As for parsing your data, you'll likely need more or less complex regular expressions.
Do you only need a few values from that large event, or do you need the entire event in Splunk?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...