We have a log which is being indexed ok to start with, however splunk stops reading it when only half has been indexed. the log is 4.6MB and XML format.
In data previewer:
500 events read
Bytes = 4,535,856
Events = 2,877 (about half)
You're probably triggering MAX_EVENTS in stitching the lines back together. By default Splunk breaks on newlines, then attempts to "linemerge" the lines back into the context of the main event (consider a stack trace with multiple lines...). The default behavior is to only seam together 257 (the original + 256 more) lines in this way. It's more efficient if you can tell Splunk "yo, just consume the whole file" rather than that split + recombine behavior.
For a file that is an entire, complete and single XML document, I'd suggest to set the LINE_BREAKER to "just grab everything" ^()$
and disable the linemerging functionality with SHOULD_LINEMERGE=false
.
Just to be clear... you're offering statistics from the data previewer, but you don't mean that do you? Splunk only previews a sub section of the data, not the whole file. You're talking about Splunk not actually indexing the whole file correct... not just that it's only previewing half.
If it has stopped reading the file before the file ends, there should be some reason for stopping indicated in the splunkd.log since Splunk will be taking action on the index.
TRUNCATE will come into play only if Splunk is suddenly seeing one big line for some reason...
Could you add TRUNCATE=0 in props.conf?
check the splunkd.log
$SPLUNK_HOME/var/log/splunk/splunkd.log
there might be something interesting in there...