Getting Data In

Delay Ingestion at the Universal Forwarder Until Event Is Complete

ArchieCrozier
Path Finder

I have an interesting dilemma and I believe there is a solution, but I can use some advice on this one.

We have a log file that records submitted requests in the following format:

8038$$DRY ETCH$$3/9/2021 9:45:22 AM$$[More columns separated by double-$]

The first "field" is the request ID, then the "$$", then an area, then "$$", then the actual date/time, etc.

The issue is that every time a new entry is filled out, the next request ID is also added to the log file, so if the record above was the last one entered, the log file would end with the following records:

8037$$CMP$$3/9/2021 7:32:04 AM$$[More columns separated by double-$]
8038$$DRY ETCH$$3/9/2021 9:45:22 AM$$[More columns separated by double-$]
8039

My problem is that at 9:45:22 AM, Splunk is ingesting this as the event:
"$$DRY ETCH$$3/9/2021 9:45:22 AM$$[More columns separated by double-$]
8039"

It ingests the request ID for the current event in the last event.  There are often hours between requests.  I want the ingestion to break immediately at the [\r\n] and NOT ingest the record ID from the last row in the log file until hours later when the event is completed as a new request gets entered for the request ID.

This is my props.conf stanza:
[rdaeng_submissionmetrics]
TIME_PREFIX = ^\s*\d+\${2}[\s\w\d-,]+\${2}
MAX_TIMESTAMP_LOOKAHEAD = 65
TIME_FORMAT = %m/%d/%Y %I:%M:%S %p
LINE_BREAKER = ([\r\n]+)^\s*\d+\${2}[\s\w\d-]+\${2}\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{2}:\d{2}\s+(AM|PM)\${2}
SHOULD_LINEMERGE = false
TRUNCATE = 999999
MAX_EVENTS = 2048
ANNOTATE_PUNCT = false

I was thinking about removing the LINE_BREAKER and adding the BREAK_ONLY_BEFORE = \d+\${2}[\s\w\d-]+\${2}\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{2}:\d{2}\s+(AM|PM)\${2}

Suggestions on the best method?  If I used BREAK_ONLY_BEFORE, will it still add the future request ID as the tail of the latest event?

If I use MUST_BREAK_AFTER = \$\$(No|Yes).*##(No|Yes)##(No|Yes)[\r\n]+

Would it still record the 4 digit number of the next request ID as a record by itself?

If I setup a transforms to throw out a 4 digit number that is the only thing in the record, would the universal forwarder send the 4 digit number the next time (doubt it because the UF keeps track of its last chunked position that it sent and the Heavy Forwarder is what throws out the request ID that came through - the UF doesn't even know it was thrown out)?  I'm stuck.  "Help me, Obi-Splunk Kenobi. You're my only hope."

0 Karma

ArchieCrozier
Path Finder

Going to try and use:

MUST_NOT_BREAK_AFTER = \d+
MUST_BREAK_AFTER = [\r\n]+

I just think the request ID will still get ingested at the time it is logged and not later when the request ID event is completed/logged.

0 Karma

ArchieCrozier
Path Finder

I found this post useful by @joesrepsolc, but it mostly covers entire file ingestion into a single event and was answered nicely by @bandithttps://community.splunk.com/t5/Getting-Data-In/How-to-set-a-large-log-to-ingest-as-one-single-event...

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.