My log source generates events ended with null-character ('\x00') and sends them to Splunk via TCP in chunks every 10 seconds. So, one TCP connection can contain several events, separated with null-character. However, my Splunk instance (on Microsoft Windows Server machine) doesn't recognize it as event separator and replaces it literally with "\x00" string. I've tried to use several regex patterns as line separator:
LINE_BREAKER = ([\r\n\0]+)
LINE_BREAKER = ([\r\n\00]+)
LINE_BREAKER = ([\r\n\x00]+)
LINE_BREAKER = ([\r\n]+)|(\0+)
LINE_BREAKER = ([\r\n]+)|(\00+)
LINE_BREAKER = ([\r\n]+)|(\x00+)
... but unfortunately they don't work for me.
Currently I'm using '\x00' pattern to separate events, but it is definitely not a good solution.
Any suggestions?
UPD:
Sample hex dump (from WireShark capture) of a tcp connection containing 4 events
0000 3c 31 34 3e 4a 61 6e 20 32 37 20 31 37 3a 35 39 <14>Jan 27 17:59
0010 3a 34 38 20 43 45 46 3a 30 7c 53 61 6d 70 6c 65 :48 CEF:0|Sample
0020 20 45 76 65 6e 74 7c 31 2e 30 7c 48 54 54 50 20 Event|1.0|HTTP
0030 54 72 61 6e 73 61 63 74 69 6f 6e 7c 31 20 72 65 Transaction|1 re
0040 71 4d 65 74 68 6f 64 3d 50 4f 53 54 20 75 73 65 qMethod=POST use
0050 72 41 67 65 6e 74 3d 4d 69 63 72 6f 73 6f 66 74 rAgent=Microsoft
0060 20 4f 66 66 69 63 65 2f 31 35 2e 30 20 28 57 69 Office/15.0 (Wi
0070 6e 64 6f 77 73 20 4e 54 20 36 2e 32 3b 20 4d 69 ndows NT 6.2; Mi
0080 63 72 6f 73 6f 66 74 20 4f 75 74 6c 6f 6f 6b 20 crosoft Outlook
0090 31 35 2e 30 2e 34 34 32 30 3b 20 50 72 6f 29 20 15.0.4420; Pro)
00a0 69 6e 3d 33 35 38 00 3c 31 34 3e 4a 61 6e 20 32 in=358.<14>Jan 2
00b0 37 20 31 37 3a 35 39 3a 34 38 20 43 45 46 3a 30 7 17:59:48 CEF:0
00c0 7c 53 61 6d 70 6c 65 20 45 76 65 6e 74 7c 31 2e |Sample Event|1.
00d0 30 7c 48 54 54 50 20 54 72 61 6e 73 61 63 74 69 0|HTTP Transacti
00e0 6f 6e 7c 31 20 72 65 71 4d 65 74 68 6f 64 3d 50 on|1 reqMethod=P
00f0 4f 53 54 20 75 73 65 72 41 67 65 6e 74 3d 4d 69 OST userAgent=Mi
0100 63 72 6f 73 6f 66 74 20 4f 66 66 69 63 65 2f 31 crosoft Office/1
0110 35 2e 30 20 28 57 69 6e 64 6f 77 73 20 4e 54 20 5.0 (Windows NT
0120 36 2e 32 3b 20 4d 69 63 72 6f 73 6f 66 74 20 4f 6.2; Microsoft O
0130 75 74 6c 6f 6f 6b 20 31 35 2e 30 2e 34 34 32 30 utlook 15.0.4420
0140 3b 20 50 72 6f 29 20 69 6e 3d 34 35 31 00 3c 31 ; Pro) in=451.<1
0150 34 3e 4a 61 6e 20 32 37 20 31 37 3a 35 39 3a 34 4>Jan 27 17:59:4
0160 38 20 43 45 46 3a 30 7c 53 61 6d 70 6c 65 20 45 8 CEF:0|Sample E
0170 76 65 6e 74 7c 31 2e 30 7c 48 54 54 50 20 54 72 vent|1.0|HTTP Tr
0180 61 6e 73 61 63 74 69 6f 6e 7c 31 20 72 65 71 4d ansaction|1 reqM
0190 65 74 68 6f 64 3d 50 4f 53 54 20 75 73 65 72 41 ethod=POST userA
01a0 67 65 6e 74 3d 4d 69 63 72 6f 73 6f 66 74 20 4f gent=Microsoft O
01b0 66 66 69 63 65 2f 31 35 2e 30 20 28 57 69 6e 64 ffice/15.0 (Wind
01c0 6f 77 73 20 4e 54 20 36 2e 32 3b 20 4d 69 63 72 ows NT 6.2; Micr
01d0 6f 73 6f 66 74 20 4f 75 74 6c 6f 6f 6b 20 31 35 osoft Outlook 15
01e0 2e 30 2e 34 34 32 30 3b 20 50 72 6f 29 20 69 6e .0.4420; Pro) in
01f0 3d 33 35 38 00 3c 31 34 3e 4a 61 6e 20 32 37 20 =358.<14>Jan 27
0200 31 37 3a 35 39 3a 34 38 20 43 45 46 3a 30 7c 53 17:59:48 CEF:0|S
0210 61 6d 70 6c 65 20 45 76 65 6e 74 7c 31 2e 30 7c ample Event|1.0|
0220 48 54 54 50 20 54 72 61 6e 73 61 63 74 69 6f 6e HTTP Transaction
0230 7c 31 20 72 65 71 4d 65 74 68 6f 64 3d 50 4f 53 |1 reqMethod=POS
0240 54 20 75 73 65 72 41 67 65 6e 74 3d 4d 69 63 72 T userAgent=Micr
0250 6f 73 6f 66 74 20 4f 66 66 69 63 65 2f 31 35 2e osoft Office/15.
0260 30 20 28 57 69 6e 64 6f 77 73 20 4e 54 20 36 2e 0 (Windows NT 6.
0270 32 3b 20 4d 69 63 72 6f 73 6f 66 74 20 4f 75 74 2; Microsoft Out
0280 6c 6f 6f 6b 20 31 35 2e 30 2e 34 34 32 30 3b 20 look 15.0.4420;
0290 50 72 6f 29 20 69 6e 3d 34 35 31 00 Pro) in=451.
That's what I see in Splunk Web.
<14>Jan 27 17:59:48 CEF:0|Sample Event|1.0|HTTP Transaction|1 reqMethod=POST userAgent=Microsoft Office/15.0 (Windows NT 6.2; Microsoft Outlook 15.0.4420; Pro) in=358\x00<14>Jan 27 17:59:48 CEF:0|Sample Event|1.0|HTTP Transaction|1 reqMethod=POST userAgent=Microsoft Office/15.0 (Windows NT 6.2; Microsoft Outlook 15.0.4420; Pro) in=451\x00<14>Jan 27 17:59:48 CEF:0|Sample Event|1.0|HTTP Transaction|1 reqMethod=POST userAgent=Microsoft Office/15.0 (Windows NT 6.2; Microsoft Outlook 15.0.4420; Pro) in=358\x00<14>Jan 27 17:59:48 CEF:0|Sample Event|1.0|HTTP Transaction|1 reqMethod=POST userAgent=Microsoft Office/15.0 (Windows NT 6.2; Microsoft Outlook 15.0.4420; Pro) in=451\x00
Well, after a bunch of experiments I came up with the following solution:
If an event source sends events separated with null-character, then, most likely, Splunk (or something else) replaces null-characters with "\x00" string before applying line breaking & merging rules. In this case, one should add "\x00" pattern (attention to double backslash, because it is supposed to match "\x00" string, not the null-character) to LINE_BREAKER property in props.conf.
For example:
LINE_BREAKER = (\\x00)
Well, after a bunch of experiments I came up with the following solution:
If an event source sends events separated with null-character, then, most likely, Splunk (or something else) replaces null-characters with "\x00" string before applying line breaking & merging rules. In this case, one should add "\x00" pattern (attention to double backslash, because it is supposed to match "\x00" string, not the null-character) to LINE_BREAKER property in props.conf.
For example:
LINE_BREAKER = (\\x00)
If your problem is solved, please accept an answer.
All right, lets try a few more. Start with this one -
LINE_BREAKER = (?:\x00)
SHOULD_LINEMERGE = false
LEARN_MODEL = false
That's one way in splunk to specify that Null byte. Leave out the + for now. Hopefully, that will get splunk to split at ONLY the null byte, and the non-capturing specification might help in some way (it was in one of the accepted answers to something else, referenced way below).
Assuming that works, then we'll need one perhaps like this -
LINE_BREAKER = (?:[\r\n\x00]+)
SHOULD_LINEMERGE = false
LEARN_MODEL = false
A couple of references -
https://answers.splunk.com/answers/27720/understanding-line-breaker-regexes.html
https://answers.splunk.com/answers/154819/how-to-configure-line-breaker-regex-in-props-conf.html
This one might be very helpful. They use "SEDCMD-remove_nulls = s/\x00//g" to completely delete the nulls. You could do something like that, and instead change the /x00 characters to \r\n etc. It's also where I noticed the non-capturing specification (?: ).
https://answers.splunk.com/answers/83790/how-do-i-remove-x00-characters-from-my-log-message.html
And charset may be involved if that doesn't work -
https://answers.splunk.com/answers/106700/seing-null-x00-bytes-in-indexed-data-from-log-file-in-wind...
First of all, thank you for your suggestions!
Unfortunately, using "LINE_BREAKER = (?:\x00)" doesn't sovlse the problem, at least, in my case. I also thought about charset, but according to packet dump, it is definitely ASCII and Splunk recognizes it correctly, so it is not an issue, from my point of view.
Taking into account, that "LINE_BREAKER = (\x00)" actually works, I suppose, that Splunk (or something else) replaces null-character with '\' 'x' '0' '0' char sequence before line breaking-merging happens.
Sure! I saw a comment somewhere about that replacement of the null byte with a char representation, but I wasn't sure when and where it might apply. Glad you managed to work it out.
For those who come along later, please post the answer you found to your own question and mark it accepted.
A small chunk of hex dump of the data at the beginning/end interface of a few events would be helpful.
Can we have some sample events?