I have a .csv file with several fields. there are many date fields and text fields, but fields are long blobs of text (such as the body of an e-mail) lets call such a field "longtext". The problem is that whenever Splunk encounters a newline character in the longtext field, it interprets it as a line break which throws off the event breaks.
How can I get all of the text in "longtext" to be properly indexed without Splunk interpreting the newline as a line break?
Example:
create_time,request_id,username,longtext,responded_time,closed_time
2013-11-23 11:00,2322,johnsmith,Here is the long blob of text I was talking about. If i have a newline here: <newline>
Splunk sees it as a break in the log file and doesn't place the rest of this text in the longtext field,2013-11-23 13:43,2013-11-23 14:05
Any ideas?
If this is a DOS format text file, you should be able break on CR-LF line breaks with LINE_BREAKER (in props.conf)
[yoursourcetype]
LINE_BREAKER=(\r\n)
Following seems to be working for me, for the sample data you have given (to be added in props.conf)
[yoursourcetype]
INDEXED_EXTRACTIONS = csv
KV_MODE = none
MAX_TIMESTAMP_LOOKAHEAD = 50
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE = true
pulldown_type = 1