Getting Data In

Issue filtering events with regex and props.conf

xwill13
Engager

Hello, I am trying to figure out how to edit props.conf so that it splits my events properly. The events are added to a log file, which looks like this:

 

******************************************************************************

Mon 01/02/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.


******************************************************************************

Tue 01/03/2023
09:00 AM


******************************************************************************

The command completed successfully.

The system cannot find the file specified.
\\share\folder\folder\folder\file
0 file(s) copied.
The system cannot find the file specified.


******************************************************************************

Wed 01/04/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.


******************************************************************************

Thu 01/05/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.


******************************************************************************




I would like my events to look like this:

******************************************************************************

Mon 01/02/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.

It seems like no matter what I try, I can't get splunk to separate it properly.

The file updates daily and I have been testing my settings by uploading a copy of the text file directly and then adding then configuring splunk to monitor the file for continuous updates.

 

Typically the preview for the uploaded file looks somewhat acceptable like this:

Mon 01/02/2023
09:00 AM


******************************************************************************

The command completed successfully.

1 file(s) copied.
\\share\folder\folder\folder\file
1 file(s) copied.
1 file(s) copied.

 

This output would work, however I did notice that it is consistently cutting off the first line of text. The real problem comes in with the monitoring process.

It tends to split the data in a way that seems almost random, and definitely isn't matching my regex settings.

The date, the asterisks and the text get placed into separate events for reasons i dont understand.

 

My props.conf settings are displayed below:

[log_file_test]

BREAK_ONLY_BEFORE = \*{78}\s*[a-zA-z]{3}\s\d{2}\/\d{2}\/\d{2}\/\d{4}

NO_BINARY_CHECK = 1

SHOULD_LINEMERGE=1

category=custom

pulldown_type=1

disabled=false


Any clues as to what I might be doing wrong or neglecting?

 

0 Karma
1 Solution

LRF
Path Finder

Hi @xwill13 ,

These are the props settings that I would use for your input:

 

[ log_file_test]
MAX_TIMESTAMP_LOOKAHEAD=0
TIME_PREFIX=^
SHOULD_LINEMERGE=false
LINE_BREAKER=(\*{78}[\r\n]+)[a-zA-Z]+\s\d+/\d+/\d+
TRUNCATE=500
NO_BINARY_CHECK=true

 

It is usually better to play with the LINE_BREAKER and SHOULD_LINEMERGE=false to prevent Splunk from breaking events into single lines (using the default line breaker) and then consume resources doing the merging operations.

Others settings are specified for improved indexing/parsing performance; TIME_FORMAT was left to let Splunk automatically interpret the timestamp with the hours and minutes part of the timestamp in the other line

Hope this will help you, have a good day,

Fabrizio

 

View solution in original post

0 Karma

LRF
Path Finder

Hi @xwill13 ,

These are the props settings that I would use for your input:

 

[ log_file_test]
MAX_TIMESTAMP_LOOKAHEAD=0
TIME_PREFIX=^
SHOULD_LINEMERGE=false
LINE_BREAKER=(\*{78}[\r\n]+)[a-zA-Z]+\s\d+/\d+/\d+
TRUNCATE=500
NO_BINARY_CHECK=true

 

It is usually better to play with the LINE_BREAKER and SHOULD_LINEMERGE=false to prevent Splunk from breaking events into single lines (using the default line breaker) and then consume resources doing the merging operations.

Others settings are specified for improved indexing/parsing performance; TIME_FORMAT was left to let Splunk automatically interpret the timestamp with the hours and minutes part of the timestamp in the other line

Hope this will help you, have a good day,

Fabrizio

 

0 Karma

xwill13
Engager

Thanks! This seems to have worked perfectly so far.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...