I have an event like this:
~01~20241009-100922;899~19700101-000029;578~ASDF~QWER~YXCV
There are two timestamps in this. I have setup my stanza to extract the second one. But in this particular case, the second one is what I consider "bad".
For the record, here is my props.conf:
[QWERTY]
SHOULD_LINEMERGE = true BREAK_ONLY_BEFORE_DATE = true
MAX_TIMESTAMP_LOOKAHEAD = 43 TIME_FORMAT = %Y%m%d-%H%M%S;%3N
TIME_PREFIX = ^\#\d{2}\#.{0,19}\# MAX_DAYS_AGO = 10951
REPORT-1 = some-report-1 REPORT-2 = some-report-2
The consequence of this seems to be that splunk indexes the entire file as a single event, which is something i absolutely want to avoid.
Also, I do need to use linemerging as the same file may contain xml dumps.
So what I need is something that implements the following logic:
if second_timestamp_is_bad:
extract_first_timestamp()
else:
extract_second_timestamp()
Any tips / hints on how to mitigate this scenario using only options / functionality provided by splunk are greatly appreciated.
Adding to what's already been said - there is very rarely a legitimate use case for SHOULD_LINEMERGE. Relying on Splunk recognizing something as date to break data stream into events is not a very good idea. You should rather set a proper LINE_BREAKER.
Timestamps are extracted before INGEST_EVAL is performed, so you'll need to use the := operator to replace _time.
These props should work better than those shown.
[QWERTY]
# better performance with LINE_BREAKER
SHOULD_LINEMERGE = false
# We're not breaking events before a date
BREAK_ONLY_BEFORE_DATE = false
# Break events after newline and before "~01~"
LINE_BREAKER = ([\r\n]+)~\d\d~
MAX_TIMESTAMP_LOOKAHEAD = 43
TIME_FORMAT = %Y%m%d-%H%M%S;%3N
# Skip to the second timestamp (after the milliseconds of the first TS)
TIME_PREFIX = ;\d{3}~
MAX_DAYS_AGO = 10951
REPORT-1 = some-report-1 REPORT-2 = some-report-2
I see that INGEST_EVAL allows for the use of conditionals. Thank you very much, I'll give that a try.