I have logs coming in that are either straight text (single line) or text with a JSON string as well.
I have no issues with the straight text, but if there is additional JSON, the event breaks on an attribute with a date.
If the JSON has no additional date, it appears to be OK.
Sample log event with JSON
2018-11-28T11:25:32.876+0000 STDIO [INFO] 2018-11-28 11:25:32 [Thread-3-ESWriterBolt] DEBUG BaseBolt - {
"attribute1": 243,
"attribute2": "Standard",
"attribute3": 2018-11-28T13:11:45.3720",
"attribute4": "Y"
}
Everything up to attribute2 reads fine, however, attribute3 starts a new event, timestamped with the date value there, and going until the end, or until potentially another date field.
The current props.conf for this log type just parses a few fields and also includes TRUNCATE = 0 for no truncation of these events.
What additional to I need to setup in props.conf to make this work?
Thanks!
You can try with below configuration on Indexer OR Heavy Forwarder whichever comes first from Universal Forwarder.
props.conf
[yoursourcetype]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%3N%z
MAX_TIMESTAMP_LOOKAHEAD=28
LINE_BREAKER=([\r\n]+)\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}\W\d{4}
You can try with below configuration on Indexer OR Heavy Forwarder whichever comes first from Universal Forwarder.
props.conf
[yoursourcetype]
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TIME_FORMAT=%Y-%m-%dT%H:%M:%S.%3N%z
MAX_TIMESTAMP_LOOKAHEAD=28
LINE_BREAKER=([\r\n]+)\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}\W\d{4}
SHOULD_LINEMERGE must be set to false when you use LINE_BREAKER.
Other than that, this should do the trick. The reason for this behavior: by default Splunk automatically detects timestamps and also assumes that is where it should break up events. Which works fine with single line events, or events that have 1 timestamp, on their first line. But for this type of events you see it doesn't behave as you want it to.
In general it is always better to define a specific LINE_BREAKER and set SHOULD_LINEMERGE to false and define explicit timestamp configuration as well (TIME_PREFIX, TIME_FORMAT, MAX_TIMESTAMP_LOOKAHEAD). This not only improves reliability of parsing, it also greatly improves the performance, as splunk doesn't have to apply all of its auto detection magic.
Thanks @FrankVI, updated original answer, didn't notice this because I was playing with only one event.
Thanks for both your help. I had tried a LINE_BREAKER previous, but it looks like my REGEX wasn't quite correct. First indications in the development lab is that this is working.
Hi,
Can you please post your props.conf for above data?
It really isn't much for the log file type:
[storm]
EXTRACT-Storm_Class_MessageType = ^[^ \n]* (?P[^ ]+)\s+[(?P\w+)
TRUNCATE = 0
The extraction is to pull some data out of the text part of the message, which is working fine.