Splunk Enterprise

How to not include a pattern when line merging?

ricotries
Communicator

I have a log from an application that isn't structured in any standard format and I am struggling with dropping certain lines at index time due to the line merging configuration.

This is a pseudo sample of the data:

----- <application version> -----
(<timestamp>) <data> 
(<timestamp>) <data>

----- <application version> -----
(<timestamp>) <data> 
(<timestamp>) <data>

----- <application version> -----
(<timestamp>) <data> 
(<timestamp>) <data>
<data>
<data>
<data>
(<timestamp>) <data>

As you can see, for some events the message is broken down into multiple lines, so the best way to break events would be by the timestamp, so this is the props.conf I wrote for this source type:

[my_new_sourcetype]
SHOULD_LINEMERGE = true
BREAK_ONLY_BEFORE_DATE = true
TRANSFORMS-drop_header = new_sourcetype_drop_header

 And the associated transforms.conf:

[new_sourcetype_drop_header]
REGEX = ^-{5}.+-{5}$
DEST_KEY = queue
FORMAT = nullQueue

 

The issue becomes that when the data is indexed, any event that would have been the <application version> header by itself is dropped, but then there are events with a linecount of 2 that look like:

(<timestamp>) <data>
----- <application version> -----

 

How do I force it so that the <application version> header is always made into its own event so that it can be dropped by the transforms configuration?

Labels (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Rather than breaking only before a date, try breaking before a date OR a header.

[my_new_sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(<<timestamp regex>>|-{5})
TRANSFORMS-drop_header = new_sourcetype_drop_header
TIME_FORMAT = <<timestamp regex>>
TIME_PREFIX = \(

Be sure to replace <<timestamp regex>> with a regular expression that matches your timestamp strings.

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Rather than breaking only before a date, try breaking before a date OR a header.

[my_new_sourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(<<timestamp regex>>|-{5})
TRANSFORMS-drop_header = new_sourcetype_drop_header
TIME_FORMAT = <<timestamp regex>>
TIME_PREFIX = \(

Be sure to replace <<timestamp regex>> with a regular expression that matches your timestamp strings.

---
If this reply helps you, Karma would be appreciated.
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...