Getting Data In

Issue filtering events with regex and props.conf

xwill13
Engager

Hello, I am trying to figure out how to edit props.conf so that it splits my events properly. The events are added to a log file, which looks like this:

 

******************************************************************************

Mon 01/02/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.


******************************************************************************

Tue 01/03/2023
09:00 AM


******************************************************************************

The command completed successfully.

The system cannot find the file specified.
\\share\folder\folder\folder\file
0 file(s) copied.
The system cannot find the file specified.


******************************************************************************

Wed 01/04/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.


******************************************************************************

Thu 01/05/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.


******************************************************************************




I would like my events to look like this:

******************************************************************************

Mon 01/02/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.

It seems like no matter what I try, I can't get splunk to separate it properly.

The file updates daily and I have been testing my settings by uploading a copy of the text file directly and then adding then configuring splunk to monitor the file for continuous updates.

 

Typically the preview for the uploaded file looks somewhat acceptable like this:

Mon 01/02/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.

 

This output would work, however I did notice that it is consistently cutting off the first line of text. The real problem comes in with the monitoring process.

It tends to split the data in a way that seems almost random, and definitely isn't matching my regex settings.

The date, the asterisks and the text get placed into separate events for reasons i dont understand.

 

My props.conf settings are displayed below:

[log_file_test]

BREAK_ONLY_BEFORE = \*{78}\s*[a-zA-z]{3}\s\d{2}\/\d{2}\/\d{2}\/\d{4}

NO_BINARY_CHECK = 1

SHOULD_LINEMERGE=1

category=custom

pulldown_type=1

disabled=false


Any clues as to what I might be doing wrong or neglecting?

 

0 Karma
1 Solution

LRF
Path Finder

Hi @xwill13 ,

These are the props settings that I would use for your input:

 

[ log_file_test]
MAX_TIMESTAMP_LOOKAHEAD=0
TIME_PREFIX=^
SHOULD_LINEMERGE=false
LINE_BREAKER=(\*{78}[\r\n]+)[a-zA-Z]+\s\d+/\d+/\d+
TRUNCATE=500
NO_BINARY_CHECK=true

 

It is usually better to play with the LINE_BREAKER and SHOULD_LINEMERGE=false to prevent Splunk from breaking events into single lines (using the default line breaker) and then consume resources doing the merging operations.

Others settings are specified for improved indexing/parsing performance; TIME_FORMAT was left to let Splunk automatically interpret the timestamp with the hours and minutes part of the timestamp in the other line

Hope this will help you, have a good day,

Fabrizio

 

View solution in original post

0 Karma

LRF
Path Finder

Hi @xwill13 ,

These are the props settings that I would use for your input:

 

[ log_file_test]
MAX_TIMESTAMP_LOOKAHEAD=0
TIME_PREFIX=^
SHOULD_LINEMERGE=false
LINE_BREAKER=(\*{78}[\r\n]+)[a-zA-Z]+\s\d+/\d+/\d+
TRUNCATE=500
NO_BINARY_CHECK=true

 

It is usually better to play with the LINE_BREAKER and SHOULD_LINEMERGE=false to prevent Splunk from breaking events into single lines (using the default line breaker) and then consume resources doing the merging operations.

Others settings are specified for improved indexing/parsing performance; TIME_FORMAT was left to let Splunk automatically interpret the timestamp with the hours and minutes part of the timestamp in the other line

Hope this will help you, have a good day,

Fabrizio

 

0 Karma

xwill13
Engager

Thanks! This seems to have worked perfectly so far.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...