Getting Data In

Event pattern for sourcetype

krishnani
New Member

I'm troubleshooting some issues with one sourcetype and realized that Splunk is not indexing events very well. The format for these events is a little different, but there are clear boundaries and these are always prefixed by =LOGLEVEL REPORT====Date====, and end with two lines feeds. it would be nice if splunk could split events on these boundaries.

  1. Break events based on these boundaries
  2. Define a logLevel field based on the text before "REPORT"

Example events:
=TYPE REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX

How to configure the props.conf?

Tags (1)
0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust
[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)

This method assumes "TYPE" in your example was the loglevel.

Works fine with sample data I created based on your examples:

=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX

And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.

This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉

alt text

View solution in original post

0 Karma

krishnani
New Member

Thanks guys 🙂

0 Karma

jkat54
SplunkTrust
SplunkTrust
[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)

This method assumes "TYPE" in your example was the loglevel.

Works fine with sample data I created based on your examples:

=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX

And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.

This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉

alt text

0 Karma

lguinn2
Legend

Well, new events do not always begin with "=LOGLEVEL REPORT====" as your example shows. (Unless "TYPE" is a log level, or maybe an abstract example.) But I would do this in props.conf

[yoursourcetypehere]
TIME_PREFIX = \=\w+ REPORT====
MAX_TIMESTAMP_LOOKAHEAD=35
TIME_FORMAT=%d-%b-%Y::%H:%M:%S
EXTRACT-e1 = \=(<?loglevel>\w+) REPORT====
MAX_EVENTS = 500

This should actually be enough to get the events broken out correctly and with the right timestamp on each event. While it would be more efficient to create a LINEBREAKER to precisely identify the event boundary, I don't recommend that if you are new to Splunk or inexperienced with regular expressions.
By default, Spunk considers the line containing the timestamp to be the first line of the event. That default should work fine in your case.

BREAK_ONLY_BEFORE_DATE = true   #is the default

Note that I also included a setting for MAX_EVENTS. This controls the maximum number of lines per event (it isn't well named). The default is 128 lines per event - if Splunk is not separating events properly, this also could be the cause. I set the limit to 500 arbitrarily, but you should make sure that it is set to something reasonable for your data.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...