Knowledge Management

Why indexing removes carriage return characters (0x0d)?

hannus
Explorer

Example data in a file which should become a multi line event:
111111
222222

Both lines end with CR+LF (0x0d+0x0a), this is on Windows 7.

I create a new index for this. I import this data file using the Add Data wizard in Splunk Enterprise. I let it even use defaults by not specifying any source type and letting it create it. Then I open the file "0" in program folder "rawdata" with a hex editor. There I can see that 0x0d has been removed. However 0x0a is in the raw data file. Carriage return is removed, newline is not.

Is this normal Splunk Enterprise functionality ? Or do I have some setting that causes it ? I can't figure this out...

Thanks in advance!

0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

You need to study what is called event breaking.

By default Splunk is breaking your data I to individual events. The default line breaker is ([\r\n]+) and everything in the capture group is discarded.

If you want to preserve the format you need to customize your props.conf. Something like this might work:

[sourcetypeName]
SHOULD_LINEMERGE=true
MUST_BREAK_AFTER= Randomstring
TRUNCATE=9999999

I think there is a specific props.conf setting for end of file as the line breaker/must break before setting.

View solution in original post

0 Karma

jkat54
SplunkTrust
SplunkTrust

You need to study what is called event breaking.

By default Splunk is breaking your data I to individual events. The default line breaker is ([\r\n]+) and everything in the capture group is discarded.

If you want to preserve the format you need to customize your props.conf. Something like this might work:

[sourcetypeName]
SHOULD_LINEMERGE=true
MUST_BREAK_AFTER= Randomstring
TRUNCATE=9999999

I think there is a specific props.conf setting for end of file as the line breaker/must break before setting.

0 Karma

hannus
Explorer

I just can't make this work.

I get the meaning of LINE_BREAKER and the default value. It is clearly meant for log files where every line ends with some combination of CR and LF. And it removes them so that only the real data is kept, not the end-of-line characters. Well I thought that if I add some random string to LINE_BREAKER it would not find it and it keeps the CR and LF and not try to replace them. But no. It decides to change them anyway.

So I guess there is something (actually alot since I'm a newbie) that I'm not understanding. I wonder if anyone could solve this.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Where is the props.conf and how are you ingesting the data?

0 Karma

hannus
Explorer

While working on this problem I'm using "Add Data" wizard from the main UI. Using SE version 6.4.1.
Props.conf:
[A_MyTestSourcetype]
NO_BINARY_CHECK = false
category = Custom
description = Testing CR
pulldown_type = 1
disabled = false
SHOULD_LINEMERGE = false
LINE_BREAKER=(ABabABab)
TRUNCATE = 9999999
File (for example):
111 cr 11111 crlf
22 cr 222222 crlf
33333333 crlf
444 lf 44444
In this best case CR inside the line are kept but at the end they are removed.
Thanks for taking time to help!

0 Karma

jkat54
SplunkTrust
SplunkTrust

How about this:

[mysinglefilesourcetype]
SHOULD_LINEMERGE = false
LINE_BREAKER = ((*FAIL))
TRUNCATE = 99999999

https://answers.splunk.com/answers/106075/each-file-as-one-single-splunk-event.html

0 Karma

hannus
Explorer

That LINE_BREAKER = ((*FAIL)) seems to do the trick. Splunk now indexes the imported file correctly (as seen from the "0" file in "rawdata" folder).
I suppose you don't know how to export data exactly how it is in the index file...? My export tests (from GUI) show:
CR -> CR (ok)
LF -> LF (ok)
LF+CR -> LF+CR (ok) BUT
CR+LF -> LF (fail)
I need to do some more digging on exporting data... I suppose I will create whole another question for this. Thank you very much for your help on this!

0 Karma

jkat54
SplunkTrust
SplunkTrust

Yeah unfortunately I claim no expertise for the file export issue. Can you open another question with just that there and I'll upvote / me-too it? I think you'll want to submit a ticket for that. You might also consider exporting via the API to see if the behavior is different.

Do you mind marking my answer as the answer to your main question?

Thanks,
Michael

0 Karma

hannus
Explorer

I currently evaluating the product with no paid license so I guess I'm not in a position where I could submit a ticket...

0 Karma

jkat54
SplunkTrust
SplunkTrust

Also, LINE_BREAKER must have a capture group that will be discarded. For example:

(RandomStringThatDoesntOccurInYourData)

0 Karma

hannus
Explorer

And while this is out in the open, why Splunk adds newline (0x0a) character in the end of the export (at least from GUI). I'd need to get data in and out unchanged!

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...