Getting Data In

How to configure line breaking on TCP events separated with null character ('\x00')?

vsilchev
Explorer

My log source generates events ended with null-character ('\x00') and sends them to Splunk via TCP in chunks every 10 seconds. So, one TCP connection can contain several events, separated with null-character. However, my Splunk instance (on Microsoft Windows Server machine) doesn't recognize it as event separator and replaces it literally with "\x00" string. I've tried to use several regex patterns as line separator:

LINE_BREAKER = ([\r\n\0]+)
LINE_BREAKER = ([\r\n\00]+)
LINE_BREAKER = ([\r\n\x00]+)
LINE_BREAKER = ([\r\n]+)|(\0+)
LINE_BREAKER = ([\r\n]+)|(\00+)
LINE_BREAKER = ([\r\n]+)|(\x00+)

... but unfortunately they don't work for me.

Currently I'm using '\x00' pattern to separate events, but it is definitely not a good solution.

Any suggestions?

UPD:

Sample hex dump (from WireShark capture) of a tcp connection containing 4 events

0000   3c 31 34 3e 4a 61 6e 20 32 37 20 31 37 3a 35 39  <14>Jan 27 17:59
0010   3a 34 38 20 43 45 46 3a 30 7c 53 61 6d 70 6c 65  :48 CEF:0|Sample
0020   20 45 76 65 6e 74 7c 31 2e 30 7c 48 54 54 50 20   Event|1.0|HTTP 
0030   54 72 61 6e 73 61 63 74 69 6f 6e 7c 31 20 72 65  Transaction|1 re
0040   71 4d 65 74 68 6f 64 3d 50 4f 53 54 20 75 73 65  qMethod=POST use
0050   72 41 67 65 6e 74 3d 4d 69 63 72 6f 73 6f 66 74  rAgent=Microsoft
0060   20 4f 66 66 69 63 65 2f 31 35 2e 30 20 28 57 69   Office/15.0 (Wi
0070   6e 64 6f 77 73 20 4e 54 20 36 2e 32 3b 20 4d 69  ndows NT 6.2; Mi
0080   63 72 6f 73 6f 66 74 20 4f 75 74 6c 6f 6f 6b 20  crosoft Outlook 
0090   31 35 2e 30 2e 34 34 32 30 3b 20 50 72 6f 29 20  15.0.4420; Pro) 
00a0   69 6e 3d 33 35 38 00 3c 31 34 3e 4a 61 6e 20 32  in=358.<14>Jan 2
00b0   37 20 31 37 3a 35 39 3a 34 38 20 43 45 46 3a 30  7 17:59:48 CEF:0
00c0   7c 53 61 6d 70 6c 65 20 45 76 65 6e 74 7c 31 2e  |Sample Event|1.
00d0   30 7c 48 54 54 50 20 54 72 61 6e 73 61 63 74 69  0|HTTP Transacti
00e0   6f 6e 7c 31 20 72 65 71 4d 65 74 68 6f 64 3d 50  on|1 reqMethod=P
00f0   4f 53 54 20 75 73 65 72 41 67 65 6e 74 3d 4d 69  OST userAgent=Mi
0100   63 72 6f 73 6f 66 74 20 4f 66 66 69 63 65 2f 31  crosoft Office/1
0110   35 2e 30 20 28 57 69 6e 64 6f 77 73 20 4e 54 20  5.0 (Windows NT 
0120   36 2e 32 3b 20 4d 69 63 72 6f 73 6f 66 74 20 4f  6.2; Microsoft O
0130   75 74 6c 6f 6f 6b 20 31 35 2e 30 2e 34 34 32 30  utlook 15.0.4420
0140   3b 20 50 72 6f 29 20 69 6e 3d 34 35 31 00 3c 31  ; Pro) in=451.<1
0150   34 3e 4a 61 6e 20 32 37 20 31 37 3a 35 39 3a 34  4>Jan 27 17:59:4
0160   38 20 43 45 46 3a 30 7c 53 61 6d 70 6c 65 20 45  8 CEF:0|Sample E
0170   76 65 6e 74 7c 31 2e 30 7c 48 54 54 50 20 54 72  vent|1.0|HTTP Tr
0180   61 6e 73 61 63 74 69 6f 6e 7c 31 20 72 65 71 4d  ansaction|1 reqM
0190   65 74 68 6f 64 3d 50 4f 53 54 20 75 73 65 72 41  ethod=POST userA
01a0   67 65 6e 74 3d 4d 69 63 72 6f 73 6f 66 74 20 4f  gent=Microsoft O
01b0   66 66 69 63 65 2f 31 35 2e 30 20 28 57 69 6e 64  ffice/15.0 (Wind
01c0   6f 77 73 20 4e 54 20 36 2e 32 3b 20 4d 69 63 72  ows NT 6.2; Micr
01d0   6f 73 6f 66 74 20 4f 75 74 6c 6f 6f 6b 20 31 35  osoft Outlook 15
01e0   2e 30 2e 34 34 32 30 3b 20 50 72 6f 29 20 69 6e  .0.4420; Pro) in
01f0   3d 33 35 38 00 3c 31 34 3e 4a 61 6e 20 32 37 20  =358.<14>Jan 27 
0200   31 37 3a 35 39 3a 34 38 20 43 45 46 3a 30 7c 53  17:59:48 CEF:0|S
0210   61 6d 70 6c 65 20 45 76 65 6e 74 7c 31 2e 30 7c  ample Event|1.0|
0220   48 54 54 50 20 54 72 61 6e 73 61 63 74 69 6f 6e  HTTP Transaction
0230   7c 31 20 72 65 71 4d 65 74 68 6f 64 3d 50 4f 53  |1 reqMethod=POS
0240   54 20 75 73 65 72 41 67 65 6e 74 3d 4d 69 63 72  T userAgent=Micr
0250   6f 73 6f 66 74 20 4f 66 66 69 63 65 2f 31 35 2e  osoft Office/15.
0260   30 20 28 57 69 6e 64 6f 77 73 20 4e 54 20 36 2e  0 (Windows NT 6.
0270   32 3b 20 4d 69 63 72 6f 73 6f 66 74 20 4f 75 74  2; Microsoft Out
0280   6c 6f 6f 6b 20 31 35 2e 30 2e 34 34 32 30 3b 20  look 15.0.4420; 
0290   50 72 6f 29 20 69 6e 3d 34 35 31 00              Pro) in=451.

That's what I see in Splunk Web.

<14>Jan 27 17:59:48 CEF:0|Sample Event|1.0|HTTP Transaction|1 reqMethod=POST userAgent=Microsoft Office/15.0 (Windows NT 6.2; Microsoft Outlook 15.0.4420; Pro) in=358\x00<14>Jan 27 17:59:48 CEF:0|Sample Event|1.0|HTTP Transaction|1 reqMethod=POST userAgent=Microsoft Office/15.0 (Windows NT 6.2; Microsoft Outlook 15.0.4420; Pro) in=451\x00<14>Jan 27 17:59:48 CEF:0|Sample Event|1.0|HTTP Transaction|1 reqMethod=POST userAgent=Microsoft Office/15.0 (Windows NT 6.2; Microsoft Outlook 15.0.4420; Pro) in=358\x00<14>Jan 27 17:59:48 CEF:0|Sample Event|1.0|HTTP Transaction|1 reqMethod=POST userAgent=Microsoft Office/15.0 (Windows NT 6.2; Microsoft Outlook 15.0.4420; Pro) in=451\x00
0 Karma
1 Solution

vsilchev
Explorer

Well, after a bunch of experiments I came up with the following solution:

If an event source sends events separated with null-character, then, most likely, Splunk (or something else) replaces null-characters with "\x00" string before applying line breaking & merging rules. In this case, one should add "\x00" pattern (attention to double backslash, because it is supposed to match "\x00" string, not the null-character) to LINE_BREAKER property in props.conf.

For example:

LINE_BREAKER = (\\x00)

View solution in original post

vsilchev
Explorer

Well, after a bunch of experiments I came up with the following solution:

If an event source sends events separated with null-character, then, most likely, Splunk (or something else) replaces null-characters with "\x00" string before applying line breaking & merging rules. In this case, one should add "\x00" pattern (attention to double backslash, because it is supposed to match "\x00" string, not the null-character) to LINE_BREAKER property in props.conf.

For example:

LINE_BREAKER = (\\x00)

richgalloway
SplunkTrust
SplunkTrust

If your problem is solved, please accept an answer.

---
If this reply helps you, Karma would be appreciated.
0 Karma

DalJeanis
Legend

All right, lets try a few more. Start with this one -

LINE_BREAKER = (?:\x00)
SHOULD_LINEMERGE = false
LEARN_MODEL = false

That's one way in splunk to specify that Null byte. Leave out the + for now. Hopefully, that will get splunk to split at ONLY the null byte, and the non-capturing specification might help in some way (it was in one of the accepted answers to something else, referenced way below).

Assuming that works, then we'll need one perhaps like this -

LINE_BREAKER = (?:[\r\n\x00]+)
SHOULD_LINEMERGE = false
LEARN_MODEL = false

A couple of references -
https://answers.splunk.com/answers/27720/understanding-line-breaker-regexes.html
https://answers.splunk.com/answers/154819/how-to-configure-line-breaker-regex-in-props-conf.html

This one might be very helpful. They use "SEDCMD-remove_nulls = s/\x00//g" to completely delete the nulls. You could do something like that, and instead change the /x00 characters to \r\n etc. It's also where I noticed the non-capturing specification (?: ).
https://answers.splunk.com/answers/83790/how-do-i-remove-x00-characters-from-my-log-message.html

And charset may be involved if that doesn't work -
https://answers.splunk.com/answers/106700/seing-null-x00-bytes-in-indexed-data-from-log-file-in-wind...

vsilchev
Explorer

First of all, thank you for your suggestions!

Unfortunately, using "LINE_BREAKER = (?:\x00)" doesn't sovlse the problem, at least, in my case. I also thought about charset, but according to packet dump, it is definitely ASCII and Splunk recognizes it correctly, so it is not an issue, from my point of view.

Taking into account, that "LINE_BREAKER = (\x00)" actually works, I suppose, that Splunk (or something else) replaces null-character with '\' 'x' '0' '0' char sequence before line breaking-merging happens.

0 Karma

DalJeanis
Legend

Sure! I saw a comment somewhere about that replacement of the null byte with a char representation, but I wasn't sure when and where it might apply. Glad you managed to work it out.

For those who come along later, please post the answer you found to your own question and mark it accepted.

0 Karma

DalJeanis
Legend

A small chunk of hex dump of the data at the beginning/end interface of a few events would be helpful.

0 Karma

somesoni2
Revered Legend

Can we have some sample events?

0 Karma
Get Updates on the Splunk Community!

Harnessing Splunk’s Federated Search for Amazon S3

Managing your data effectively often means balancing performance, costs, and compliance. Splunk’s Federated ...

Infographic provides the TL;DR for the 2024 Splunk Career Impact Report

We’ve been buzzing with excitement about the recent validation of Splunk Education! The 2024 Splunk Career ...

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...